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PREFACE 



Published 

and locally-produced tests are used in collecting data. How should data col- 



are participating in an experiment* 

(c) score the test 7 Are the results the same whether teachers or "outsiders 1 ' ad- 
minister the test ? These are questions of high significance to educational re- 
searchers. Through an excellent design developed by William Goodwin and 
unstinting cooperation by a large midwest school system, answers to the ques- 
tions were sought and obtained. Theresean 
and procedures as carefully as the results. 



Herbert J. Klausmeler 
Co-Director for Research 
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ABSTRACT 



In classroom experimentation, the instruments used for data collection are 
often commercial or specially-designed tests. The researcher is faced with se- 
lecting that means of administering the tests to maximize the validity and gen- 
eralizability of .any conclusions reached. However, the complex interaction of 
many variables in the classroom often eludes the experimenter, and the result- 
ing uncontrolled variation frequently causes a finding of "no significant differ- 
ence" or of spurious significance. 

Four null hypotheses were tested to determine the differential effects of: 

1. Experimental atmosphere and absence of same; 

2. Notice of test (10 school days) and no notice (one school day); 

3. Teacher administration and outside administration of the test; and 

4. Teacher scoring and outside scoring of the test. 

The experimental unit was the classroom. Sixty-four sixth-grade classes, 
each from a different school in a large midwestern city, were ranked and grouped 
into four strata on the basis of previous arithmetic achievement. Within each 
strata, classes were randomly assigned to one of the 16 experimental treatments 
generated by a 2 factorial design using the four independent variables as listed 
above and in connection with a recent arithmetic achievement test as a response 
measure. 

Experimental atmosphere was created using written Instructions, and test 
notice was given by mail. The outside test administrators and scorers were 
graduate and undergraduate students. Resulting class means of the 64 classes 
for each of the three sub^tests in the exam were subjected to a 4 X 2 4 analysis 

of variance. The error term was composed by pooling selected higher-order 
interactions. 

Tests of the main effects revealed significantly higher class means on one 
of the three sub-tests for those classes receiving 10 school days® notice of the 
upcoming test and significantly higher class means on all three sub-tests for 
those classes whose regular teacher administered the test. Several two-factor 
interactions were significant, most notably the combination of experimental at- 
mosphere and notice of testing producing higher grade placements than the com- 
bination of no experimental atmosphere and no notice. 

The conclusions reached were: 

1. Advance notice of test date has a significant facilitating effect on pupil test 
performance if the test Includes novel concepts easily taught. 

2. Teacher administration of standardized tests has a significant facilitating ef- 
fect on pupil test performance as compared with administration of the tests 
by outside personnel. 

3. Experimental atmosphere combined with notice of testing results in signifi- 
cantly higher pupil test performance if the test includes novel concepts easily 
taught. 

4. No notice of testing combined with outside scoring results in significantly 
lower pupil test performance. 

5. Outside scorers produced higher grade placements than teachar-scorars in 
high achieving classes, while teacher-scorers produced higher grade place- 
ments than outside scorers in low achieving classes. 
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INTRODUCTION TO THE PROBLEM 



The Importance of controlled experimentation 
in school settings is gradually gaining accept- 
ance among educators. In classroom expert- 
mentation, the Instruments used for data 
collection are often commercial or specially- 
designed tests. The researcher is faced with 
selecting that means of administering the tests 
to maximize the validity and generaliz ability 
of any conclusions, reached. However, the 
complex interaction of many variables in the 
classroom often eludes the experimenter, and 
the resulting uncontrolled variation frequently 
causes a finding of "no significant difference" 
or of spurious significance. 

One of the most critical variables in class- 
room experimentation is the teacher. In addi- 
tion to his primary task of administering the 
experimental treatment, the teacher is often 
asked to collect the response data which will 
be used to evaluate the experimental treatments. 
The practice of letting classroom teachers ad- 
minister tests in a research project is fairly 
widespread because it is convenient and inex- 
pensive. Bringing in "outsiders" to do the 
testing is more costly and might cause teachers 
resentment. Some persons insist that pupils 
will not perform up to capacity unless their 
regular teacher gives the test. In a research 
project, however, the objective is not to elicit 
the best possible performance from each indi- 
vidual pupil, aided by his classroom teacher. 
Instead the emphasis is upon the necessity of 
testing each class under identical conditions, 
Insofar as possible. 

Most researchers conducting studies within 
the classroom use the teacher to administer 
treatments or to assist in varying ways; in ef- 
fect, the teacher is cast in the role of a sub- 
experimenter. Many conjectures have been 
made as to the effect of sub-experimenters on 
experimental results, but little systematic ob- 
servation or measurement of such effects has 
occurred. Of special concern is the effect that 
"being in an experiment" has on these sub- 
experimenters. 



Specifically, the problem to be researched 
has four facets: What is the effect of varying 
conditions of experimental atmosphere, notice 
of testing, test administrator (teacher or "out- 
sider"), and test scorer (teacher or "outsider") 
upon student performance as measured by test 
results ? Expressed in the null form, the four 
hypotheses to be tested are: 

1. There is no significant difference in test 
perform ance between pupil s whose teachers 
believe an experiment is in progress and 
pupils whose teachers do not so believe. 

2* There is no significant difference in test 
performance between pupils whose teachers 
receive notice of the test date and pupils 
whose teachers do not receive notice. 

3. There is no significant difference in test 
performance between pupils whose regular 
teacher administers the test and pupils who 
are tested by an "outside" administrator. 

4. There is no significant difference in test 
performance between pupils whose regular 
teachers score the test and pupils whose 
teachers do not. 

Also of Interest will be the testing of the 
two-factor Interactions and the Interpretation 
of those which are significant. Ten two-factor 
Interactions are generated by the five factors 
in the study: four independent variables as 
implied in the hypotheses above and a single 
leveling variable, previous arithmetic achieve- 
ment. It would be laborious and somewhat re- 
dundant to list the null hypotheses related to 
the 1 0 two-factor interactions. 

The Importance of the problem to educational 
psychology is readily apparent. Generaliz abil- 
ity of the problem would provide guidelines to 
be followed by educational psychologists in 
order to enhance the meaningfulness and effect 
of their classroom experimentation. The hypo- 
theses to be tested are of such a nature that 
some persons might infer that the honesty of 
the classroom teacher is being questioned and 
investigated. This is not the case. No deli- 



O 

ERLC 






2 



berate or conscious effort on the teacher's part 
to unethically aid his pupils is beir.g suggested. 
Any teacher who is psychologically healthy 
knows and likes his students, and he sincerely 
and naturally desires that they perform and 
achieve well. This desire of the teacher can 
produce unconscious motivations and even acts 
that significantly assist, the pupils in their 
classroom endeavors, such as taking tests. 

No less salient than consideration of the 
teacher variable is another warning for the ex- 
perimenter: in his seal to avoid sources of 

bias, he must carefully go about the recruiting 
* of experimental assistants. If the experimenter 
is going to use "outsiders 1 ' to administer tests 
at the conclusion of an experiment, what as- 
surances has he that this Is not a biased 
group ? Has the experimenter seen or talked 
to them individually or as a group ? Has he 
discussed the experiment with any of them? 
Have any of them read previous studies by the 
researcher, studies from which one could ac- 
curately infer the variables currently being 
investigated ? 

It is seldom possible to deal with all the 
significant problems germane to a specified 
area in a single study; certain problems are not 
investigated by this experiment. One signifi- 
cant question not considered is: what is the 



difference in test performance between pupils 
who have the test administered by research 
personnel (outsiders) who believe that an ex- 
periment is in progress and pupils whose out- 
side test administrators do not so believe? 
The question was not brought under investiga- 
tion in this study because an attempt to answer 
it concurrently might have jeopardized the test 
of the first hypothesis (in that it was deemed 
necessary to give the outside administrator 
Information regarding experimental atmosphere 
that in no way contradicted the information 
given the teacher whose pupils the outsider 
was testing). 

Another significant question is: what is the 
difference in scoring performance between 
scorers who believe that the data was gathered 
in an important experiment end scorers who do 
not so believe ? This question is an Important 
one and is obviously related to the area investi- 
gated by the four hypotheses above. By a pro- 
cess of "double scoring" the tests that were 
scored by outsiders, it was possible to answer 
this question as it pertains to amount of time 
spent scoring, errors committed, and average 
scores tabulated. This aspect of the investi- 
gation, because of its tangential nature, is not 
included in this technical report, but is avail- 
able in another source (Goodwin, 1965). 



II 

REVIEW OF THE LITERATURE 



The literature related to this problem is not 
definitive. Although many educators have 
spoken of various aspects of the problem, those 
seeing fit to publish their beliefs are few in- 
deed. The literature will be considered in four 
parcels (corresponding to the four independent 
variables under Investigation). 



EXPERIMENTAL ATMOSPHERE 

The literature available on this subject can 
be divided into the effect of participating in an 
experiment (l)on subjects and (2) on experi- 
menters themselves. 

The motivational influence of being subjects 
in an experiment has been dubbed the "Haw- 
thorne Effect." At Western Electric's Hawthorne 
plant in Chicago, a series of research investi- 
gations was carried out in the late 1920's and 
early 1930's. The attention given to the work- 
ers as experimental subjects was evaluated as 
one variable causing high production on the 
part of the employees, regardless of the vary- 
ing work conditions established (Mayo, 1945; 
Roethllsberger, 1941; and Roethlisberger and 
Dickson, 1941). 

Interest in, and casual references to, the 
Hawthorne Effect have been in evidence for 
several decades. Recently the subject has 
been taken under investigation in a U.S. Office 
of Education Cooperative Research Project at > 
Ohio State University. The director of the pro- 
ject has written on the relationships between 
the Hawthorne Effect and research in education 
(Cook, 1962). Note the definition formulated; 

The Hawthorne effect is a phenomenon char- 
acterized by an awareness on the part of the 
subjects of special treatment created by 
artificial experimental conditions. This 
awareness becomes confounded with the in- 
dependent variable under study, with a sub- 
sequent facilitating effect on the dependent 
variable, thus leading to ambiguous results 
(Cook, 1962, p. 118). 



Cook writes of how the Hawthorne Effect has 
plagued and confounded educational research. 

The bias of experimental subjects has been 
alluded to in an article by Orne (1962). In a 
pilot project, Orne attempted to design tasks 
which subjects would refuse to do, or would 
tire of quickly; the tasks developed were 
noxious, boring, meaningless, and/or ridicu- 
lous. The most usual result was a heroic per- 
serverance on the part of the subject. Post- 
experiment questionnaires indicated that the 
subjects ascribed meaning to their performance, 
visualizing it as a test of endurance or the like. 

Ornemarvelec at the willing, almost cheer- 
ful compliance of the experimental subject. 
The subject, in Orne's estimation, is concerned 
with two sets of variables in an experimental 
situation. First, there are the variables es- 
tablished by the instructions or experimental 
task itself. The second set of variables is 
labeled "demand characteristics" by Orne. 
This concept envisions the subject as taking 
it upon himself to "figure out" the hypotheses 
being tested in the experiment, and he more or 
less actively seeks cues to achieve this end. 
In Orne's words, ". . . the totality of cues 
which convey an experimental hypothesis to 
the subject become significant determinants of 
subjects' behavior" (Orne, 1962, p. 779). 
Possible cues are rumors about the research, 
the information conveyed when the subject is 
asked to participate, the person of the experi- 
menter, the laboratory setting, and implicit 
and explicit communications during the experi- 
ment (Sarason and Minard, 1963). Obviously, 
the sophistication, Intelligence, and experi- 
ence of subjects vary and these, in turn, will 
partly determine the demand characteristics of 
a given experimental situation. For our purpose 
here, suffice it to say that the subject is alert 
and susceptible to bias from many possible 
sources; Indeed, the phenomenon is a possible 
and plausible explanation of why many investi- 
gators attempting to replicate earlier experi- 
ments are unable to do so. 
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Th*s entire area of the effect on subjects of 
participating in experiments has been treated 
systematically in relation to experimental de- 
signs in education. Under the heading of reac - 
tivity, Campbell and Stanley (1963) discuss the 
effects of certain experimental arrangements 
"which would preclude generalization about 
the effect of the experimental variable upon 
persons being exposed to it in nonexperlmental 
settings. " These authors feel that the proper 
or natural arrangement of experimental condi- 
tions will result in the subject's being unaware 
that an experiment is in progress. 

The effect of the experiment upon the ex- 
perimenter himself, rather than upon the sub- 
ject, is considered extensively in a recent 
journal article (Klntz, et al. . 1965) and in the 
work of Rosenthal. In addition to a review of 
the major articles in the area (1963), Rosenthal 
has attacked the problem systematically on 
many fronts in his own research. In a number 
of interesting discussions and experiments, 
Rosenthal and his associates have shown that 
experimenter bias must be considered a vari- 
able of importance, even in experiments in- 
volving the performance of albino rats and 
planarla (Rosenthal and Fode, 1961; Rosenthal 
and Hales, 1962). In the experiment involving 
rats, it is pointed out that the experimenter 
need not oe obvious in his attitude toward a 
subject's performance in order to Influence, and 
therefore bias, the subject's actions. In other 
words, the experimenter can influence the out- 
come without slamming a rat into his home cage 
after a poor run or giving another a pat or two for 
a good performance. Rather the experimenter' s 
attitude may be mediated to the subject much 
more subtly via changes in the experimenter's 
temperature, skin moisture, etc., as ha watches 
what he considers a good or poor performance 
by the animal. The comparative sophistication 
and intelligence of human subjects would sug- 
gest a proclivity on their part • toward active 
analysis of any attitudes subtly implied by the 
experimenter through words, gestures, etc. In 
one crucial experiment, it was shown that re- 
search assistants easily can be affected by the 
bias of their employer (Rosenthal, .elait, 1963). 

In his summary article (1963), Rosenthal 
concludes that "experimenter outcome- 
orientation bias is both . airly general and a 
fairly robust phenomenon ' In an interesting 
passage, he states: 

But perhaps the most compelling and the 
most general conclusion to be drawn is that 
human beings can engage in highly effective 
and Influential unprogrammed and unintended 



communication with one another. The sub- 
tlety of this communication is such that 
casual observation of human dyads is un- 
likely to reveal the nature of this communi- 
cation process. Sound motion pictures may 
provide the necessary opportunity for more 
leisurely, intensive, a^d repeated study of 
subtle, influential communication processes. 
We have obtained sound motion tpicture 
records of 28 experimenters each interacting 
with several subjects. ... In these films, 
all Es read identical words to their Ss so 
that the burden of communication falls on 
the gestures, expressions, and intonations 
which accompany the highly programmed as- 
pects of Es' inputs into the E-S interaction 
(Rosenthal, 1963, p. 279). 

Recent reports on the analysis of subsequently 
obtained motion pictures have generated many 
interesting hypotheses in this regard (Rosenthal, 
1965). 

McQulgan (1963) has looked at the experi- 
menter as an additional stimulus object. He 
divided multi-experimenter experiments into 
three classes. In Class 1, different experi- 
menters do not differentially affect the results. 
InClass II experiments, an experimenter varies 
from others but always in a consistent direction; 
for example, Ej obtains higher scores for all 
groups than E 2 . In the third class of experi- 
ments, the characteristics of a particular ex- 
perimenter interact with treatment conditions. 
Whereas results cf the first two classes are 
generalizable, the results in Class III experi- 
ments are not. McQulgan suggests that re- 
search reports include specification of varying 
results obtained by different experimenters. 
In this way he hopes to control the experimenter 
variable or at least increase the knowledge 
about the effects of experimenters on their 
subjects, and he discusses the essential ideas 
behind generalizing to a population of experi- 
menters. 

The teacher occupies a unique position in 
most educational research. Seldom is the 
teacher the experimenter, yet he often has the 
task of administering the experimental treat- 
ment. Thus, in educational research, added 
to the problem of experimenter bias is the po- 
tential bias displayed by the classroom teacher 
whose students make up the experimental popu- 
lation. Very often the teacher applies or ad- 
ministers the treatment; in a sense, he is a 
co- or sub-experimenter. At other times, the 
teacher is not aware that his class is in an 
experiment. The question can be asked: do 
the pupils of a- teacher perform differentially 
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depending on whether the teacher does or does 
not know that his class is in an experiment ? 
If the pupils do, the researcher must carefully 
weigh the advantages and disadvantages of in- 
forming the teachers that an experiment is in 
progress. 

There is little if any recent research on the 
possible effects of using teachers as sub- 
experimenters. On the other hand, early ex- 
perimenters and writers warned against the 
possible contaminating influence of such a 
practice. , Consider the works of two of the 
first writers in the field as well as an early 
illustrative educational experiment. 

McCall (1923) felt that each teachercon- 
sciously or unconsciously revealed to his pu- 
pils the experimental treatment that he preferred 
when his class was involved in an experiment. 
The students then reacted either favorably or 
unfavorably toward the teacher' s preferred 
treatment, depending on their personal like or 
dislike of the teacher. McCall recommended 
that the best way to avoid bias was to keep 
those administering the treatments Ignorant of 
the objectives of the experiment. 

Another early educator wrote in much the 
same manner. Brooks, a superintendent of 
schools discussed later because of bis nearly 
complete reliance on standardized tests for 
measuring teacher merit, considered the prob- 
lem of whether or not to Inform the teacher of 
the experiment. 

It is an open question whether or not the 
teachers themselves should be informed of 
the main purpose in view— ’that is, the pur- 
pose- of comparing the efficiency of the two 
methods. If we could be perfectly sure that 
both teachers would be thoroughly Interested 
and honest about the experiment it would 
undoubtedly be wise to seek their intelli- 
gent cooperation, since by so doing we 
should be more likely to get the best pos- 
sible results from both methods. But if 
thinking their reputations are at stake, one 
or both are likely to be tempted to stretch 
the time limit for daily drill or to persuade 
the pupils to drill themselves for speed and 
accuracy outside of class, then it will prob- 
ably be better to leave them in blissful ig- 
norance of the main plot, rerely seeing to 
itthat each teacher devotes thesame 
amount of time to class drill in the funda- 
mentals each day. In this way one can in- 
fer what each of the methods would accom- 
plish under everyday working conditions in 
the hands of equally competent teachers 
(Brooks, 1921a, p. 340). 



An early experiment in education provides 
a good example of teacher bias ("Student," 
1931). In the Lanarkshire milk experiment, 
20,000 pupils served as subjects in an attempt 
to determine the value of adding. milk to a 
child's diet. In each of 67 schools, 200 to 
400 pupils were "randomly" divided into ex- 
perimental and control groups, randomness 
purportedly achieved by balloting or by an al- 
phabetical means. However, design 
allowed the teachers to inspect the two result- 
ing groups and to substitute subjects when it 
appeared that die "random" procedure had 
given an undue proportion of well-fed or ill- 
nourished children to one group or the other. 
The teachers were biased in that, given this 
choice, they tended to place the smaller and 
under-nourished children in the experimental 
group that was to receive milk. Figures avail- 
able shov/ed the milk group to be noticeably 
shorter and lighter at the beginning of die ex- 
periment. 

Other examples of teacher bias of a similar 
nature are included in other sections of this 
review. In this entire review, however, a 
critical fact to note is that every instance of 
teacher- bias is an early educational experiment. 
The professional literature of the past 30 years 
is essentially devoid of any mention of the 
subject. Teachers of today, as compared with 
those of the 1920-1930 era, are better educated 
and more professional In many respects. The 
results of early studies might not be capable 
of replication today because of basic changes 
in the characteristics of the American teacher. 



MOT DC IE OF TESTING 

Somewhat less extensive is the literature 
available on the effect of test performance of 
notifying teachers (and thereby their pupils) of 
the particular day on which a test will be given. 
Several articles appear on coaching for tests. 

Most articles on the coachability of tests 
have been written regarding intelligence tests; 
these are briefly reviewed in an article by 
French and Dear (1 959 ). In their own particular 
interest, French and Dear were concerned with 
the effect of coaching for the College Board' s 
Scholastic Aptitude Test (SAT). Although sta- 
tistically significant differences were found 
when coaching for the SAT occurred, the dif- 
ferences were small enough that they were of 
little practical significance. An associated 
investigation showed that even when items 
identical to the test items wera scattered 
among the practice items, no substantial gain 
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resulted unless the Items were identified as 
actual test items during the coaching. French 
and Dear concluded, for the SAT at least, that 
a candidate would be wise not to pay for spe- 
cialized coaching but rather review and read 
on his own. 

In early experiments on coaching, Gilmore 
(1927) found that both the experimental (or 
coached) group and the control group made sub- 
stantial gains on the Otis Grou p Intelligence 
Scale when it was readministered after a 12-. 
week interval. In the same year, another re- 
searcher (DeWeerdt, 1927) found that his 
coached group gained more on the Illinois Ex- 
amination than did the control group. However, 
the superiority was confined to the analysis, 
synonym- antonym, and sentence- vocabulary 
sections of the test, and was not apparent in 
the verbal ingenuity section or three arithmetic- 
related sections. 

In an experiment more closely related to the 
present one because it did not involve coach- 
ing, one group of junior high school students 
received two days' notice of an upcoming unit 
test in science while the other group received 
no notice (Tyler and Chalmers, 1943). The 
difference in the average score favored the 
"warned" group but was less than two percent- 
age points greater than the control group and 
fell short of statistical significance. Obvious- 
ly, in this experiment, the forewarned pupils had 
fairly accurate perceptions as to the questions 
that might be asked on the test. 

Turning from experiments in the area of 
coaching for tests, consider articles touching 
on other ideas pertinent to this study. The im- 
plications of the discussion to follow overlap 
with the next two variables, administration 
and scoring of tests. As a starting point, it is 
readily apparent that a possible means of eval- 
uating a teacher's effectiveness is the gain to 
his pupils' proficiency. Indeed, rather elabo- 
rate discussions have focused upon pupil gain 
as an accessible and potent measure of teacher 
merit (Bolton, 1945; Ryans, 1949). 

Other authors have written of the pitfalls 
and dangers of such a procedure. Douglas 
(1935) quite early considered evaluating 
teachers bytesting their pupils subject to many 
pitfalls, chief among them being the inordinate 
emphasis on those course objectives easily 
amendable to testing. A more recent and de- 
tailed objection to such a practice has been 
voiced by Thorndike and Hagen (1955). They 
label the practice as questionable at best and 
quite possibly vicious, citing several consid- 
erations overlooked by such a method: that the 
achievement of a class group is a function of 



more than the present year's effort; that an 
achievement test battery measures only a frac- 
tion of the objectives of a modern school; that 
teachers will become demoralised by such a 
mechanical evaluatory procedure; and that 
teachers will tend, with more or less directness, 
to concentrate on the skills so tested, thereby 
"teaching for the tests. " 

The last mentioned consideration could be a 
crucial one. A rather penetrating article is 
quoted at some length to highlight some of the 
ramifications of this issue. 

Did the teachers tea ch the test directly or 
indirectly ? It may seem undignified even to 
suggestthatsuchan unprofessional practice 
might be carried on. But in certain school 
systems the practice j| carried on by cer- 
tain teachers, and those who are trying to 
Interpret tests should be aware of this pos- 
sibility. Of course, any time a group of 
learners is taught a test the use of norms 
accompanying the test becomes meaningless. 
Unfortunately some administrators and 
supervisors have unwittingly encouraged thi s 
practice, partially through a procedure sug- 
gested by the next question. 

Are teachers give n raises or promotions on 
th« basis of to at results? Some superin- 
tendents, principals, and supervisors cast- 
ing about for an objective basis for giving 
promotions in rank or salary Increases have 
settled on the idea of giving these rewards 
to those who can produce the best test re- 
sults. The goal is admirable but this par- 
ticular method has resulted in many unpro- 
fessional practices and should be eliminated 
in any place it exists (Simpson, 1947, p. 63). 

Thus, it can be seen that some defensive 
teachers might, given notice of an upcoming 
test, actively teach the test or otherwise go to 
great lengths to prepare their oupils for the 
test. Indeed, even a teacher who is not de- 
fensive might be expected to teach the test if 
his status and livelihood depends upon his 
pupils' performances. One writer supports 
the use of fall testing programs as a remedy 
to reduce the likelihood of teachers teaching 
for specific tests (Findley, 1945). 

In the literature, one early example was 
found that accentuates an unparalleled and al- 
most unbelievable emphasis on evaluating 
teachers by testing their pupils. In three 
separate but related writings, Brooks (1921a, 
1921b, and 1922) detailed the techniques he 
used as superintendent of schools to evaluate 
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his teachers. He rejected classroom observa- 
tions as It was too easy for a teacher to pre- 
pare and do well on a single day only, or to 
pull out a beautiful lesson prepared some time 
ago to be used during an unannounced visita- 
tion. instead he had the teachers administer 
standardised tests to their pupils. To add 
teeth to his system, he Continued: 

The teachers were further warned that, al- 
though I had no reason to distrust anybody, 
the matter was too important to permit tak- 
ing any chances. Accordingly, I proposed 
to check the work of each teacher by giving 
one or two of the tests in her school after 
she had given all of them. By comparing 
the results of my tests with theirs of the 
same kind, 1 could readily detect any gross 
carelessness or intentional dishonesty on 
the part of the teachers. There Is consid- 
erable temptation for some short-sighted 
teachers who know their own efficiency is 
being measured by these tests, to stretch 
the time limit or to give illegitimate aid to 
the pupils, or even to drill on the test It- 
self, in the effort to make their classes 
show up well (Brooks, 1922, pp. 28-29). 

Somehow Brooks got the teachers to approve 
this system, and he made pupil subject-matter 
progress the core element In a rating plan. 
Quite confidently he noted that the teachers 
were to be paid bonuses for any annual Increase 
in their pupils' achievement as measured by 
standardized tests and that he was certain 
that most of the teachers were working hard for 
a bonus. 

Certainly this type of Inordinate emphasis 
on the results of standardized tests by adminis- 
trators could have undesirable effects on the 
teachers when they were informed of an upcom- 
ing test. 



ADMINISTRATION OF THE TEST 

In considering the differential effects pos- 
sible when a teacher administers a test to his 
pupils or when it Is administered by an "out- 
sider" (who might even be another teacher), it 
is soon apparent that the question has seldom 
been raised in the literature. Several reasons 
could be advanced as to why pupils would not 
score better on standardized tests with different 
test administrators. 

Tiraxler (1951) suggested that some teachers, 
when testing their own pupils, might be so anx- 
ious for their students to do well that they offer 



indirect suggestions th*t help themobtaln 
higher scores. This situation is well-illustrated 
in an early spelling experiment (Rice, 1897). 
The researcher was amazed at the tremendous 
scores in spelling achieved by 33, 000 pupils, 
so he visited some 200 teachers: 

Long before I hod reached the end of my 
journey my fondest hopes had fled; for I had 
learned from many sources that the unusually 
favorable results in certain classrooms did 
not represent the natural conditions, but 
were due to the peculiar manner in which the 
examination had been conducted. ... An 
unfortunate feature of the first test was the 
fact that in many of the words careful enun- 
ciation would give the clue to the spelling. 
. . . Under these circumstances, even 
the most conscientious teachers could not 
fall, unwittingly, to give their pupils some 
assistance, If their enunciation were habit- 
ually slow and distinct; while In those in- 
stances in which my test had been looked 
upon as an opportunity for an educational 
display— in which the imperfections of 
childhood were not to be shown— the 
teachers had been afforded the means of 
giving their pupils sufficient help through 
exaggerated enunciations alone, to raise 
the class average materially (Rice, 1897, p. 
165). 

Rice gave and supervised a second exam to 
those who had done so well on the first and, 
on the average, the scores were reduced by 
one-fourth. 

A similar example is reported by Lowell 
(1919). A test required the pupil to find all 
the words that rhyme with "day, mill, and 
spring." One teacher felt that her children were 
not responding as they should so she said, 
"Why, children, you know how we have been 
finding words to go In the 'ing' family, so I 
don't see why you can't find others like 'day* 
to go In the 'ay* family. " The children immedi- 
ately began to write their responses, but the 
teacher had obviously given them an unstand- 
ardized hint. 

Hopkins and Lef ever (1964) recently investi- 
gated the comparability of test scores when 
the test was administered by the teacher and by 
television. Not aware that they were In an ex- 
periment, fifth- and sixth-grade teachers in a 
random half of the district's 20 elementary 
schools gave the Metropolitan Science Test 
using "conventional teacher administration." 
In the other 10 schools, fifth and sixth graders 
were given the test via television, with a single 
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administrator using the standardized directions. 
A statistically significant difference was found 
for the fifth grade, favoring the teacher ad- 
ministered group, but no difference was found 
for grade six* However, the finding was con- 
sidered of small practical significance, being 
less than a month in grade-equivalent units. 



SCORING OF THE TEST 

The final variable deals with the differences 
one might find, depending on whether a stand- 
ardized test was scored by the classroom 
teacher or by an "outsider. " 

It is well-established that persons scoring 
tests quite often make numerous errors. Pitner 
(1926) prepared an answer sheet for the National. 
Intelligence Test that incorporated many com- 
mon and also unusual errors made on the test. 
Then he had a number of graduate students 
score answer sheets marked identically; most 
of the graduate students had had experience as 
teachers, supervisors, or school psychologists. 
The range of raw scores given to the same an- 
swer sheet was wide; in mental age, the ex- 
tremes ranged from seven to eleven years. 

More recently, Phillips and Weathers (1958) 
examined errors made by 27 third-grade and 24 
fifth-grade teachers in scoring 5,017 achieve- 
ment tests. The tests were rescored several 
times by staff members, and 1, 404 (28%) of the 
tests were found to have scoring errors, with 
most teachers making between 10 and 40 errors 
per 100 tests. Inaccurate counting was the 
primary cause of errors (44. 8%) followed by 
Inappropriate use of Instructions (26. 1%), in- 
appropriate use of the scoring key (14.9%), er- 
rors in using the conversion tables (13.5%), and 
computational errors (• 7%). More interesting 
was the essentially normal distribution of errors 
around the "accurate" or correct test score. 
Note the findings summarized in Table 1 . 

Hulton (1925) examined the grades given to 
pupils by three Junior high school teachers, 
each teaching a different subject. He found 
that each teacher was giving higher marks to 
pupils from her own homeroom . Hulten con- 
cluded that these teachers unconsciously 
favored the pupils from her homeroom in her 
particular subject. It has even been demon- 
strated that knowledge of authorship (that is, 
knowing who wrote an exam) has an elevating 
effect on marks awarded by graders (Edmiston, 
1939). 



TABLE 1 



Errors Made in Scoring Standardized Tests 



Difference between 
Corrected and Un- 
corrected Grade 
Equivalent 


Number of Errors Affect- 
ing Grade Equivalent 
Scores 


Raising^ 


Lowering 


. 1 to .5 


508 


562 


• 6 to 1 • 0 


66 


81 


1. 1 to 1. 5 


36 


28 


1. 6 to 2. 0 


9 


7 


2. 1 to 2. 5 


8 


6 


2. 6 to 3. 0 


6 


5 


Over 3. 0 


6 


4 


Total Number of errors* 639 


693 


Smallest change 


.10 


.10 


Median change 


.31 


.29 


Largest change 


3.80 


3.50 


*72 errors did not change grade equivalents. 



RELATIONSHIP BETWEEN LITERATURE REVIEWED 
AND THIS EXPEROHENT 

Briefly, what is the relationship between 
each of the areas of literature reviewed and the 
study herein reported ? Considering the pres- 
ence or absence of experimental atmosphere, 
an educational researcher must decide whether 
or not to inform teachers (and probably, by so 
doing, informing their pupils) that an experi- 
ment is in progress. 71 he decides not to in- 
form teachers, the work of Orne suggests that 
some teachers will perceive that they are in an 
experiment and proceed to speculate about the 
variables and hypotheses under investigation. 
On the other hand, if he informs the teachers 
that an experiment is being conducted, he may 
well quite subtly bias them in directions fa- 
vorable to his particular hypotheses, as 
Rosenthal suggests. More critical, the teach* 9 
ers and their classes, being Informed of the 
experiment, may perform unnaturally well be- 
cause of the Hawthorne Effect, thereby Jeopard- 
izing the external validity or generallzability 
of the results. The dilemma is a real one. In 
this study, the differential effect of conditions 
of experimental atmosphere and absence of the 
same will be considered. The fact that the pos- 
sible bias of outside administrators (due to ex- 
perimental atmosphere) is not investigated in 
no way should belittle the importance of such a 
consideration. As noted in Chapter I, this 
question was not investigated in this study due 
to procedural and methodological limitations. 
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Early educational experiments suggest that 
test notice may be a crucial variable in many 
research Investigations, but recent literature 
is notably silent in this regard. It is liable to 
be of more significance In systems that use 
pupil progress on standardized achievement 
tests as an indication of teacher merit. A de- 
fensive teacher could use notice of the test to 
concentrate his Instruction In the area to be 
tested. However, an experimenter has almost 
complete control over thl s variable. Regardless 
of whether or not the teachers know an experi- 
ment is under way, the researcher, working 
with the school administrators, can schedule 
the testing so that any desired period of notice 
can be achieved. It may be, however, that 
notice of the test is not this potent a variable. 
The differential effect of 1 0 school days 1 and 
one school day 1 s notice will be investigated. 

The final variables, test administration and 
scoring, are investigated for practical consid- 
erations. Allowing the teacher to give and 
score tests at the end of an experiment Is a 
feasible course of action. It is convenient and 
inexpensive, and allows the pupils to react 
""naturally" and optimally to the testing situ- 
ation, because their regular teacher is the test 
administrator. The point often overlooked, 
however, is that the teachers may be biased 
in favor of their own students to varying dogrees. 
Bias is a natural and desirable phenomenon in 




some cases, but a research project is not one 
cf these. The existence of teacher bias results 
tn increased uncontrolled variation in the ex- 
periment. No less important, although not in- 
vestigated in this study, is the potential bias 
of outside test administrators. They might 
feel that their results should confirm the ex- 
perimenter's hypotheses or that their skill as a 
test administrator will be determined by how 
well the pupils that they test do. 

From the Hopkins and Lefever study one 
would expect no differences of any practical 
size between pupil performance on teacher- 
administered and outside-administered tests 
(recall in that study, teachers did not know 
that an experiment was in progress). Likewise, 
also in a non-experlmental setting, the work 
of Phillips and Weathers suggests that errors 
made by the teachers in scoring their pupils' 
tests would raise grades as often as lowering 
them. No literature is directly pertinent to the 
question of the differential effects of scoring by 
teachers and by outsiders in the case at hand, 
namely because the scoring is a routine pro- 
cedure involving a scoring ke$ multiple- choice 
responses, and no judgments by the scorer. 
The differential effects of teacher and outside 
administration of the test, and of teacher and 
outside scoring of the test are investigated in 
this study. 
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METHOD 



In this chapter, consideration is given to 
the experimental setting and subjects, the 
sampling procedure, the experimental design, 
and the procedures used in the study. Inserted 
at critical points will be rationale for the par- 
ticular decisions that had to be made during the 
experiment. Several figures and tables are 
presented to clarify and visually demonstrate 
key elements of the text. 



EXPERIMENTAL SETTING AND SUBJECTS 

The subjects used were second-semester 
sixth graders in a large midwestern city. The 
achievement level of the city's sixth graders 
is somewhat above but generally comparable to 
the nation as a whole. This can be seen 1 n 
Figure 1 in which is graphed the average Com- 
posite Scores by school on the sixth grade 
Iowa Test of Basic Skills battery given in 
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Figure 1. Average composite grade placements 
on sixth-grade Iowa Test of Basic 
Skills for 112 elementary schools in 
a large midwestern city; Testing con- 
ducted in October, 1964. 



October, 1964. Figure 2 is a histogram for the 
arithmetic subtest contained in the Iowa Test 
of Basic Skills battery. Apparently the system 
is closer to the national average (or closer to 
6. 2 as the testing was conducted in October) 
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Figure 2. Average total arithmetic grade place- 
ments on sixth-grade Iowa Test of 
Basic Skills for 112 elementary 
schools in a large midwestern city; 
Testing conducted in October, 1964. 

in Total Arithmetic score than it is on the com- 
posite battery score. Figure 3 contains the 
schools' mean Non-Verbal IQs, as measured 
by the Lorae-Thorndike Intelligence Test, and 
is evidence of a moderate similarity between 
this system and the national average. Non- 
verbal IQs are considered herein rather than 
verbal or composite IQs because arithmetic 
test scores are used as depandent variables. 

As will be explained below, the sampling 
unit was not individuals but rather classrooms. 
The school system promoted pupils semi- 
annually'. Therefore, the grouping procedures 
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Figure 3 ( Average non-verbal IQ on Lome— 
Thorndike Intelligence Test for 112 
elementary schools In a large mid- 
western city; Testing conducted in 
October, 1964 (sixth grade). 

in the individual schools varied considerably, 
and it was not unusual to group first- and 
second- semester sixth graders in the same 
classroom. In a few cases, the second- 
semester sixth graders were found in class- 
rooms with first-semester seventh graders. 

Given this pupil classification scheme, it 
was decided to invite the participation of only 
those elementary schools having one or more 
classes containing at least 15 second semes^ 
tar sixth graders. One hundred four schools 
met this criteria. The 104 school principals 
were contacted by mail and asked to participate. 
It should be noted that the school system in- 
volved had an established policy outlining the 
procedures to be followed in gaining access to 
+ he schools to conduct an experiment. The 
unusual nature of the experiment dictated that 
classroom teachers not be informed of the study 
until a later time; in other words, contrary to 
customary practice, the building principal 
could not discuss the project with the teacher 
to ascertain the letter's willingness to partici- 
pate. The officials of the system altered the 
established procedures to allow the building 
principals to conditionally approve participa- 
tion in the experiment. An elaborate plan was 
instituted whereby an alternate class could be 
substituted for any randomly selected class 
whose teacher declined to participate once 
informed of the project (i. e. , when the experi- 



mental treatment commenced). As it turned out, 
it was not necessary to use any of the alternate 
classes for this purpose. 

All building principals responded to the let- 
ter. Seventeen of the 1 04 declined to partici- 
pate. The main reason given for non— participa- 
tion was current involvement in other research 
studies. The 17 schools did not possess any 
common characteristics (e.g. , small size, low 
achievement, etc. ) that would suggest other 
factors motivating their non-willingness to 
participate. 

In 40 of the 87 schools, there were two 
classes containing 15 or more second-semester 
sixth graders, while one school contained three 
such classes. In each of these 41 schools, 
one class was selected by using a table of ran- 
dom numbers. Thus a pool of 87 classes was es- 
tablished, each containing at least 15 sixth 
graders in their second semester and each 
residing in a different school. It was nec- 
essary to include only one class per school be- 
cause of the reactive or contaminating nature 
of the experimental treatments. 

In Figure 4, the distribution of the 87 clas- 
ses in average Total Arithmetic scores on the 
Iowa Test of Basic Skills (given in October, 
1 964) can be seen. The shape of this distribu- 
tion is quite similar to that in Figure 2, support- 
ing the Inference made above that the schools 
refusing to participate comprised no single 
category (such as high achieving or low achiev- 
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Figure 4. Average total arithmetic grade place- 
ments on sixth-grade Iowa Test of 
Basic Skills for 87 classes in a large 
midwestern city; Testing conducted 
in October, 1964. 
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lng), Quite obviously one factor reducing the 
similarity between the two distributions (Fig- 
ures 2 and 4) is the change in unit considered, 
from school to class, for many schools con- 
tained more than one sixth grade class. The 
figure also serves to re-illustrate the close ap- 
proximation to the national average; 34 classes 
lie below the expected mean category (6. i to 
6.3, with its 19 classes) while another 34 Le 
above it. 

The average Non-Verbal IQs of the 87 clas- 
ses forming the pool of experimental units are 
graphed in Figure 5, reinforcing the implications 
made above that the system is not appreciably 
unlike the national average in this respect and 
that the 87 schools agreeing to participate were 
representative of the entire system (see Figure 
3). In addition to the nearness of this system 
to national norms on achievement and intelli- 
gence test scores, the system is also repre- 
sentative in that it contains schools located 
in a wide range of socioeconomic neighbor- 
hoods. 
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Figure 5. Average non-verbal IQ on Loroe- 
Thorndike Intelligence Test for 87 
sixth-grade classes in a large mid- 
western city; Testing conducted in 
October, 1964. 



SAMPLING PROCEDURES 

To reduce random variability in the design, 
a stratified random sampling procedure was 
used. The 87 classes were placed in four 
strata on the basis of previous arithmetic 



achievement (as evidenced in the October, 
1964, Iowa Te f t of Basic Skills Total Arithmetic 
scores). In arriving at a class average to de- 
termine strata, omythe Total Arithmetic scores 
of the pupils currently in the second semester 
of sixth grade were used (students who had 
been in the first semester of sixth grade in 
October, 1964). Likewise, in reaching con- 
clusions, only the scores of these pupils were 
analyzed, although all the pupils in each class 
were tested in order to achieve realism and 
credibility in all experimental treatments. 

The number of levels was established at 
four, rather than five, three, or two. Only 87 
classes were available; with the Intention to 
run all 16 experimental treatments in each 
stratum, the upper limit for number of strata 
was five. However, this would have left only 
seven classes as alternates, a perilously 
small number considering the administrative 
provision allowing teachers to refuse to parti- 
cipate once the treatments commenced. Using 
four, three, or two strata would obviously leave 
sufficient alternate classes. Although the use 
of only two strata would allow a within-cell 
error term as two classes from the same stratum 
could be randomly as signed to the same experi- 
mental treatment, the precision of the experi- 
ment was obviously enhanced by using four 
strata rather than three or two. 

Thus, the 87 classes were ranked on the 
basis of previous arithmetic achievement, and 
subsequently grouped by fourths, from highest 
to lowest. Within each of the four resulting 
strata, classes were assigned to the 16 ex- 
perimental treatments by use of a table of fan- 
dom numbers. Previous arithmetic achieve- 
ment, therefore, was used as a leveling 
variable and was included in subsequent sta- 
tistical analyses. 



EXPERIMENTAL DESIGN 
MapaiMlIaint Variables 



The 16 experimental treatments were the 
combinations generated by a 2 4 factorial de- 
slgn.uslng the following Independent variables 
in connection with a recent arithmetic achieve- 
ment test as a response measure: 

1. Experimental atmosphere (+) and absence 
of the same (-); 

2. Notice of test date (+) and no notice (-); 

3. Testing by regular teacher (4) and testing 
- . by "'outsider' 1 (-); and 
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4. Scoring by regular teacher {+) and scoring 

by "outsider" (-). 

Treatment under variable one was effected 
by letter from the office of the school system's 
director of research. Experimental units (i. e. , 
classes) under the experimental-atmosphere 
condition were Informed by mail 14 days before 
the testing date that they were in an experiment. 
Units not under the experimental-atmosphere 
condition were told when notified of the test 
date that they were randomly selected to col- 
lect normative data for a new standardized test. 

"Notice of test date' Was effected by mail 14 
days prior to test date (April 5, 1965). Under 
the "no notice" condition, teachers were not 
informed of the testing until the Friday pre- 
ceding the Monday test date. 

The test— administrator variable was intro- 
duced by sending copies of the test to the 
teacher administrators whenever their class re- 
ceived notice of the testing. Outside adminis- 
trators (graduate and under— graduate students) 
were given the test packets by their college 
instructors. All administrators (teacher and 
outside) also received detailed written instruc- 
tions on how to prepare for adm^iistering the 
test; neither group was contacted personally 
by the experimenter or any assistant to discuss 
proper test procedures. 

The test scoring variable was accomplished 
by leaving the exams with half the teachers for 
scoring by them within four days. The other 
tests were collected and scored by four out- 
siders whose orientation in regards to experi- 
mental atmosphere was identical to that of the 
teachers whose exams were scored (two of the 
scorers believed that the tests resulted from 
an experiment while the other two believed 
that normative data were being collected). All 
tests were subsequently rescored to determine 
their accuracy; and accurate or correct scores 
were used in the analysis. Two considerations 
prompted the analysis of correct or accurate 
scores, rather than analyzing scores sometimes 
Inaccurate due to scoring errors. First, the 
tendency to make scoring errors was not of 
primary concern in this study; it was assumed 
that errors would be random as no scorer, 
teacher or outsider, would deliberately record 
an erroneous score (errors were later found to 
be random for both groups of scorers). Second, 
use of accurate scores reduced the error vari- 
ance due to individual variations in carefulness 
of scoring. It is possible that a more appro- 
priate title for this variable might have been 
the "contemplation of the scoring of the test. " 

The 16 experimental treatments were pri- 



marily implemented through the use of written 
instructions, as can be Inferred from the dis- 
cussion above. Tine written instructions sent 
out to teachers to Introduce the experimental 
conditions are reproduced in Appendix A, The 
identical nature of corresponding paragraphs 
can be noted (for example, the + teacher ad- 
mlnstratlon paragraphs in treatments 1, 2, 5, 
6, 9, 10, 13, and 14). The instruction sheets 
for treatments 9 and 1 3 (and each of the subse- 
quent three pairs: 10 and 14; 11 and 15; and 
12 and 16) are Identical; however, teachers in 
treatments 9 through 12 received the instruc- 
tions on March 22, 1965, while those in treat- 
ments 13 through 16 (no-notice treatments) re- 
ceived them on April 2, 1965. 

The 64 classes randomly selected to parti- 
cipate were taught by 38 male and 26 female 
teachers. Table 2 gives a summary of the re- 
sults of the random assignment of classes to 
experimental treatments. The close approxi- 
mation of the 64 experimental classes to the 
national average should be noted, with an av- 
erage grade placement of 6. 164 on the Total 
Arithmetic score for the Iowa Test of Basic 
Skills and an average Non-Verbal IQ of 101.09 
on the Lorge-Thorndlke Intelligence Test . The 
average Total /arithmetic Scores for the four 
strata are disparate, producing differences be- 
tween strata of .334, .348, and .461 grade 
placement units. The average Total Arithmetic 
scores for the four classes in each of the 16 
treatments ranged from 6. 047 to 6. 250 while 
the Non-Verbal IQs varied from 98.72 to 104.92. 
Table 3 depicts the results of the random samp- 
ling procedures insofar as the independent 
variables are concerned. 

Dependent Variables 

Four dependent variables were investigated:: 
grade placements in arithmetic computations, 
concepts, applications, and a total or average 
score. The first three quantities are yielded 
by the Stanford Arithmetic Achievement Test r 
Intermediate II, while the fourth is an average 
of the first three. This test was copyrighted 
in 1 964 and was wholly unfamiliar to the 
teachers in the selected school system. 

An arithmetic test rather than a test in some 
other subject was initially preferred because 
of the objective scoring possible. That is, it 
was assumed that more consensus of opinion 
would exist on the "correctness" of answers 
on arithmetic problems among persons (teachers 
and outsiders) hand- scoring the tests. How- 
ever, it soon became apparent that few of to- 
day's standardized achievement tests permit 



o 

ERLC 



14 



TABLE 2 



Average Total Arithmetic Grade Placements on Iowa Test of Basic Skills and 
Non-Verbal IQs on Loroe-Thorndlke Intelligence Test by Experimental Unit, 
Treatment, and Stratum; Testing Conducted in October, 1964 





^ m a 

CO 


ID 
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• 

A 


o Z 

io o 








Stratum 












Trea 

menl 

No. 


• O 

ti 

M < 


M 13 

CO JJ 


8 j 


i 




2 




3 




4 




Averaae 


JD M 

H *5 


H <_ 


o) S 
h «35 


ITBS 


LTNV 


ITBS 


LTNV 


ITBS 


LTNV 


ITBS 


LTNV 


ITBS 


LTNV 


1 


+ 


+ 


+ 


+ 


6.723 


102.03 


6.387 


101.57 


6.108 


103.03 


5.758 


95.33 


6.244 


1 00. 49 


2 


+ 


+ 


+ 


- 


6.971 


116.23 


6.347 


103.25 


5.927 


97.46 


5.753 


93.47 


6.250 


102.60 


3 


+ 


+ 


- 


+ 


6.700 


108.79 


6.445 


98.61 


6.105 


101.40 


5.427 


89.07 


6,169 


99.47 


4 


+ 


+ 


- 


- 


6.619 


108.50 


6.394 


109.22 


6.166 


1 04. 77 


5. 604 


90.21 


6.196 


103.18 


5 


+ 


- 


+ 


+ 


6.463 


102.05 


6.453 


110. 83 


5.939 


102.06 


5.842 


100.45 


6.174 


103.85 


6 


+ 


- 


+ 


- 


6.740 


107.26 


6.282 


104.55 


5.862 


99.54 


5.305 


89.14 


6.047 


100.12 


7 


+ 


- 


- 


+ 


6.463 


105.78 


6.429 


105.10 


6. 064 


98. 08 


5.433 


87.07 


6.097 


99.01 


8 


+ 


- 


- 


- 


6.815 


109.12 


6.411 


105.93 


5.889 


90. 61 


5. 522 


91.74 


6.159 


99. 35 


9 


— 


+ 


+ 


+ 


6.579 


105.38 


6.184 


1 02. 40 


6.083 


104.50 


5.808 


91.20 


6. 164 


100. 87 


10 


— 


+ 


+ 


- 


6.958 


112.03 


6.330 


107. 10 


5.879 


97.37 


5.391 


89.13 


6.140 


101.41 


11 


— 


+ 


- 


+ 


6.717 


104.61 


6.433 


1 04. 94 


5.950 


97.35 


5.623 


90.81 


6.131 


99. 43 


12 


— 


+ 


- 


- 


6.797 


108.53 


6.386 


1 00. 94 


6.177 


101.94 


5.281 


83,46 


6.160 


98.72 


13 


— 


- 


+ 


+ 


6. 600 


110. 44 


6.329 


103.17 


6.168 


106.12 


5.843 


99.95 


6.235 


104. 92 


14 


— 


- 


+ 


- 


6.754 


112.18 


6.358 


107.46 


5.959 


96.21 


5.771 


97.82 


6.211 


103.42 


15 


— 


- 


- 


+ 


6.745 


103.55 


6.316 


106.03 


6. 000 


95.38 


5.474 


94. 30 


6.134 


99. 82 


16 

Average 




mm 


** 


6. 622 
6.704 


105.44 

107.62 


6.442 

6.370 


108.46 

104.97 


6. 071 
6.022 


102.39 

99.89 


5.146 

5.561 


86. 92 
91,88 


6.070 

6.164 


100. 80 
101. 09 



existed for the scorer to m ake a judgment about 
the appropriateness of an answer. Since arith- 
metic tests had been more thoroughly researched 
by this experimenter and since it was desired 
to minimally disrupt normal school routine (by 
giving a test in a single subject area rather than 
an entire test battery), it was decided to still 
use the single test in arithmetic achievement. 

The final step Involved determining the ap- 
propriate level of the test to use to best mea- 
sure the ability range existing in the selected 
school system. The Intermediate 1 level of the 
Stanford Arithmetic Series was too easy, having 
been designed for use from grades 4. 0-5. 4. 
The Intermediate II level was designed for 
grades 5. 5 to 6. 9. As the test would be given 
to H 6. 8” pupils, the level seemingly would be 
suitable. The question remained, however, 
whether the test might prove too easy with the 
scores loading on the upper portion of the dis- 
tribution . Therefore, the test was given to 
two sixth-grade classes in Wauwatosa, Wis- 
consin in early March, 1965. These classes 
had also taken the Iowa Test of Basic Skills in 
October, 1964, and achieved at a level (approx- 
imately 7. 0) matched only by the uppermost 
schools in stratum one in the selected school 
system. Therefore, it was assumed that the 
performance of these two classes on the Inter- 
mediate II test would be an excellent indication 
of any tendency for the scores to pile-up on the 
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TABLE 3 

Average Total Arithmetic Grade Placements on 
Iowa Test of Basic Skills and Non-Verbal IQs 
on Loroe-Thorndlke Intelligence Test by 
Independent Variable; 

Testing Conducted in October, 1964 



Independent 

Variable 




Average Total 

Arithmetic 

(ITBS) 


Average Non- 
Verbal IQ (LT) 


Experimental 


+ 


6.167 


101. 01 


Atmosphere 


- 


162 


101.17 


Notice of 


+ 


6.188 


100.77 


Testing 


- 


6.141 


1 01. 41 


Teacher 


+ 


6.183 


102.21 


Administration 


- 


6.146 


99. 97 


Teacher 


+ 


6.175 


100. 98 


Scoring 


- 


6.154 


101. 20 


Average 




6.164 


101.09 



a student to leave an answer in his ownhand- 
writing. Rather, the student computes his 
answer and then selects and marks a response 
from a list of alternatives. This being the case, 
to increase generalizablllty, a test of this type 
was selected. The tests were hand-scored 
using a scoring key* bit little opportunity 
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high end of the distribution. The two schools 
averaged, in grade placements, 7. 9 on arith- 
metic computations, 7 .6 on arithmetic concepts, 
and 8. 5 on arithmetic applications, for an av- 
erage arithmetic score of 8. 0. Although this 
was somewhat above the expected mean scores, 
given the October, 1964, scores on the Iowa 
Test of Basic Skills , no students got all the 
problems correct in any sub-test and the scores 
did not stack up on the high end of the scale. 
The test was judged appropriate for the experi- 
ment. 

As a result of the trial testing, two instruc- 
tions were added to reduce variability in test 
administration: 

1. If students ask whether they should guess 
say: "Do the best you can;" 

2. If students ask about time limits, say: 
"Work at a rapid pace; you'll probably 
have time to try all the problems." The 
complete administration Instructions are 
given in the main report (Goodwin, 1965). 

Analysis #f Data 

Resulting class means on each of the four 
dependent variables were subjected to a 4 x 2 4 
analysis of variance. The four main effects, 
and the two- and three-factor Interactions 
generated by them, were tested using an ap- 
propriate error term. In addition, the effect 
of the blocking variable (previous arithmetic 
achievement) was tested, as well as the first- 
order Interactions of it with the four indepen- 
dent variables. 

The error term initially was composed of all 
four-factor Interactions and the single five- 
factor Interaction (df = 16); these higher order 
interactions were assumed to be estimates of 
or * • A priori it was decided to use this error 
term to test the remaining three-factor inter- 
actions (those involving the stratifying or 
blocking variable) before pooling further. A 
procedure discussed by Green and Tukey (1 960) 
was selected to determine which three-factor 
Interactions could legitimately be pooled with 
the initial error term. By this procedure, the 
sums of squares and degrees of freedom of 
statistically non-significant interactions could 
be Included in the final error term. On the 
other hand, a significant interaction, or even 
one approaching significance, certainly could 
not be assumed to be zero or snail. Therefore, 
it could not be considered an estimate of e* and 
should! not be pooled. Thus, in the experiment, 
the three-factor interactions that Included the 
leveling variable were tested using the Initial 



error term. Any interaction whose F-ratlo ex- 
ceeded twice the 50 per cent point of the F- 
distribution with the corresponding degrees rf 
freedom was not pooled. The non- significant 
interactions were assumed to be estimates of 
«r* and were pooled to form the final error term 
Table 4, showing the degrees of freedom and 
expectations of mean squares, clarifies the 
analysis. 



PROCEDURES 
Tim* ScImSuI* 

A time schedule is presented at this point 
to clarify chronological relationships between 
those topics previously discussed and those 
about to be presented. At the same time, it 
will serve to emphasize the important proce- 
dural steps followed: 

January 18, 1965: Meeting with administra- 

tive officials to determine 
feasibility of study. 

January 19, 1965- Discussions with adminis- 
February 22, 1965: tratlve officials on proce- 
dural policies. 

February 22, 1965: Conditional approval re- 



March 11, 


1965: 


celved from the administra- 
tive officials to conduct 
the study. 

Preliminary tryout of 


March 20, 


1965: 


standardized test in two 
sixth-grade classes in 
Wauwatosa, Wisconsin. 
Experimental materials and 


March 29, 


1965: 


instructions mailed to 
principals of teachers in 
treatments 1 through 12 
fr experimental atmosphere 
and/or + notice of testing). 
Testing materials sent to 


April 2, 1965: 


32 graduate and advanced 
undergraduate students 
(outside testers). 
Experimental materials and 






instructions delivered to 
principals of teachers in 
treatments 9 through 16 
(- notice of testing). 



April 5, 1965 A. M: Test given in all classes. 

F.M: Tests collected from 32 
schools (-teacher scoring). 
April 9, 1965: Tests collected from 32 

schools (f teacher scoring). 
April 10, 1965 on: Analysis of data. 
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TABLE 4 



Degrees of Freedom and Expectations of Mean Squares for Analysis of Variance 



Source 


df 


E(M8) 


Source 


df 


E(MS) 


Experimental Atmosphere (E) 1 


a 2 + 4* 2 s <r £ 2 


Previous Achievement (P) 3 


<r‘+2‘ <r p » 


Notice of Test (N) 


1 


cr* +4. 2 s <r 2 
N 


EP 


3 


<r ' +2J 'EP 


Test Administrator (A) 


1 


«r 2 +4 • 2 s «r. 2 

A 


NP 


3 


' 1+2l V 


Test Scorer (S) 


1 


v 2 +4 • 2 5 <Tg 2 


AP 


3 


'* 42 * 'ap* 


EN 


1 


'EN* 


SP 


3 


' 1+21 'sp‘ 


EA 


1 


.*+4- 2 ‘ » EA » 


ENP 


3 


'‘ +2 ‘ 'ENP 1 . 


ES 


1 


+ 4 • 2 ‘ “es 2 


EAP 


3 


'* 421 'EAP 1 


NA 


1 


.* + 4.2*. n a‘ 


ESP 


3 


' 1+21 'esp 


NS 


1 


'* +4-2 ‘ W 


NAP 


3 


'' +2 ' 'nap 1 


AS 


1 


.*+4-2». a8 * 


NSP 


3 


** +2 * °NSP 2 


ENA 


1 


' 1+4 - 2 'ena 


ASP 


3 


<r 2 +2 2 <r 2 
ASP 


ENS 


1 


' ,+4 * 2 W 


Error 


16 


nr 2 


EAS 


1 


' 1+4 * 2 W 


Total df 


63 




NAS 


1 


' 1+4 ‘ 2 W 









Out* id* Test Administrate 

The outside test administrators, many of 
them aged 30 or 40, were college students en- 
rolled in advanced measurement courses who 
volunteered to do the testing for a reasonable 
compensation. The testers were randomly as- 
signed to classes. A week before the test date, 
they were given packets containing the test 
manual, enough tests for the class, and in- 
struction sheets. The instructions were ex- 
plicit and caused the tester to believe either 
that an experiment was or was not in progress, 
according to the experimental treatment the 
particular class was receiving. Thus, outside 
administrators testing classes under treatments 
3, 4, 7, and 8, believed that an experiment 
was being conducted. On the other hand, those 
testing classes in treatments 11, 12, IS, and 
16, believed only that normlng data was rou- 
tinely being collected. The main report (Good- 
win, 1965) contains the written instructions 



given to the outside administrators (each out- 
side test administrator received two sheets of 
instructions). 



Crniuct cf Tatting and Scaring in the Experiment 

On the morning of the scheduled testing, all 
schools were telephoned. The principals of 
the 32 schools tested by outside administrators 
were phoned so that alternate testers could be 
dispatched to those classes which, for some 
reason, were without a test administrator. 
However, ail of the designated outside testers 
arrived at their destinations on time. On the 
basis of reports received from building prin- 
cipals, it was apparent that the outside testers 
were well prepared to administer the test. 

In classes where teachers administered the 
test, three irregularities occurred. All are re- 
ported in the main report (Goodwin, 1965), and 
none were considered critical in biasing result- 
ing data. > 
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Tests to be scored by outsiders (treatments 
2, 4, 6, 8, 10, 12, 14, and 16) were collected 
on the afternoon of April 5, 1965. This scoring 
was done by four university students who were 
randomly selected from a pool of scorers. Two 
of the scorers believed that the tests resulted 
from an experiment, and these scorers each 
scored a random half of the data collected from 
+ experimental atmosphere classes (treatments 
2, 4, 6, and 8). The other two scorers believed 
that the tests were taken during routine test 
normlng, and these scorers each scored a ran- 
dom half of the tests collected from classes 
under treatments 10, 12, 14, and 16. 

Teachers under treatments 1,3, 5, 7, 9, U, 
13, and 15 scored the tests of their own pupils. 



These tests were collected on Friday, April 9, 
and were later rescored to determine the ac- 
curacy of Initial scoring. The tests of experi- 
mental subjects scored by outsiders were also 
rescored for accuracy of initial scoring. This 
rescoring was done by four different university 
students selected from the pool of available 
scorers. Each scorer in this latter group knew 
that an experiment was In progress, was re- 
peatedly Instructed to work accurately, and 
scored a random fourth of all the tests collected 
under all treatments. After this stage in the 
scoring, the experimenter randomly selected 
five percent of all tests and rescored them to 
ascertain that the final scores were, in fact, 
accurate. 
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IV 



RESULTS 



. In this chapter, the class means for each 
of the experimental units on the four dependent 
variables will be given. Following the tables 
of mean squares and F-ratios, significant ef- 
fects will be clarified by die presentation of 
the appropriate means. 



DETERMINATION OF FINAL ERROR TERNS 

As discussed in Chapter III, the initial error 
term was composed of the five four-factor in- 
teractions and the single five-factor interaction. 
This error term was used to test the six three- 
factor interactions that included the stratify- 
ing variable. The mean squares and F-ratlos 



that resulted are summarized for all four test 
scores in Table 5. 

The 50 percent point of the F-dlstrlbution for 
three and 16 degrees of freedom is .824. As 
can be seen in Table 5, only one three-factor 
interaction fell in the no-pool category using 
1 . 648 as die critical value. This interaction, 
A X S X P, was significant by this procedure 
for each of the four dependent variables. Ac- 
cordingly, only the sums of squares and de- 
grees of freedom for Ex NX P, Ex AX P, 
E X S X P, NXAXP, and N X S X P were pooled 
with those of the initial error term. In the 
case of each dependent variable, the resulting 
or final error term contained 31 df and was ap- 
preciably smaller, and presumably more stable, 
than the initial error term. 



TABLE 5 

Mean Squares and F-Ratios of Selected Three-Factor Interactions on 
Stanford Arithmetic Achievement Test 



Sub-Test 



Source 


dt 


ComDUtationiii 


Concents 


AoDllcations 


Average 


M.S. 


F 


M.S. 


F_ 


M.S. F 


M. S. 


F 


EX NX P 


3 


.441 


1.41 


.105 




.057 


.159 




E X A X P 


3 


.279 


mm 


.112 


- 


.024 


.071 


- 


E X S X P 


3 


.148 


- 


.015 


mm 


.101 


.008 


- 


NX AX P 


3 


.161 


- 


.070 


- 


.072 


.088 


- 


NX SX P 


3 


.064 


- 


.168 


- 


.021 


.040 


- 


AX SX P 


3 


.578 


1.85 


.473 


2.17 


.940 2.99 


.626 


2.70 


Error 


16 


1 . 312 

i 




.218 




.314 


.232 




E ■ Experimental Atmosphere 








Note: In this and all succeeding 


N = Notice of Test 










F-value tables, a hyphen 


A = Test Administrator 








indicates a F < 1 . 




S * Test Scorer 
















P « Previous Achievement 
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ANALYSES OF VARIANCE 
Arithmetic Ccmputatlwis 

As stated in Chapter III, three test scores 
were available as well as an average or com- 
posite score. The first of these scores, Com- 
putation, was based on pupil response to 39 
standard or drill type items primarily concerned 
with fundamental arithmetic processes. The 
resulting mean scores for each of the experi- 
mental units (classes) is given in Table 6, 
along with the number of second- semester 
sixth-grade pupils making up the subjects for 
eac'i class. It can be noted that although class 
size was quite comparable between treatments 
(as one would expect), this was not true be- 
tween levels, with relatively fewer second- 
semester sixth-grade students in the lower 
strata. 

The 64 class means were subjected to a 
4 X 2 4 analysis of variance. The resulting 
mean squares and F-ratlos are contained in 
Table 7. Previous arithmetic achievement 
tested highly significant, as expected, with 
means of 6.611, 6.195, 5.786, and 5. 007 grade 



placement units for strata one through four, 
respectively (see Table 6). 

Also significant was the first-order inter- 
action between experimental atmosphere and 
notice of test date. This occurred because of 
higher means for the 4 4 and ~ - treatment 
combinations as compared with the 4 - and - 4 
treatment means. The relevant means are pre- 
sented in Table 8. (In this and all subsequent 
discussions, algebraic signs will be used to 
indicate treatment Interactions in accordance 
with the treatment definitions on pages 12-13 
in Chapter III. The first sign will refer to the 
first term of the interaction as listed in the 
P-ratio tables, the second sign to the second 
term, etc. For example, In Table 7 the signifi- 
cant interaction is listed as Ex N; thus 4 4 
would refer to 4 experimental atmosphere and 
4 notice of testing, - 4 would refer to - experi- 
mental atmosphere and 4 notice of testing, etc. ) 

None of the three-factor Interactions were 
significant. 



TABLE 6 

Average Computation Grade Placements on Stanford 
Arithmetic Achievement Test and Number of Pupils by Experimental Unit, 
Treatment, and Stratum; Testing Conducted in April 1965 



Stratum 

Treat- — 



ment 

No. 


Exp. 

Atmos. 


Test Teacher Teacher 
Notice Adm* Scored 


1 




2 




3 




4 




Aver. ' 


Total 


G.P. 


N 


G.P. 


N 


G.P._N G.P. 


N 


G.P. 


N 


1 


4 


4 


4 


4 


6.315 


33 


6. 690 


29 


6.020 


31 


5.230 


* 

23 


6.064 


116 


2 


4 


4 


4 


- 


7.491 


32 


6.521 


33 


5.908 


25 


5.407 


14 


6.332 


104 


3 


4 


4 


- 


4 


8. 081 


32 


6.252 


29 


5.456 


16 


4. 500 


13 


6.072 


90 


4 


4 


4 


- 


- 


6.878 


32 


5.465 


17 


5.773 


30 


5.454 


24 


5.892 


103 


5 


4 


- 


4 


4 


6. 389 


18 


6.724 


17 


5. 685 


29 


5.770 


30 


6.142 


94 


6 


4 


- 


4 


- 


5.855 


33 


5.555 


22 


5.307 


28 


4.300 


15 


5.254 


98 


7 


4 


- 


- 


4 


6.353 


32 


6.514 


29 


5.908 


24 


4.831 


13 


5 . 901 


98 


8 


4 


- 


- 


- 


6.564 


25 


6.193 


27 


5.489 


18 


4.836 


28 


5 . 770 


98 


9 


- 


4 


4 


4 


6.109 


23 


5.688 


24 


6.246 


35 


4. 833 


21 


5.719 


103 


10 


- 


4 


4 


- 


7 . 668 


28 


6.959 


29 


4. 853 


17 


4. 359 


29 


5. 960 


103 


11 


- 


4 


- 


4 


6.127 


30 


6. 506 


18 


4. 843 


23 


4. 927 


30 


5. 601 


101 


12 


- 


4 


- 


- 


6.255 


31 


6.026 


34 


5.400 


30 


4.705 


22 


5. 596 


117 


13 


- 


- 


4 


4 


6.418 


22 


6.106 


34 


7. 300 


21 


5.381 


37 


6.301 


114 


14 


- 


- 


4 


mt 


6.496 


27 


5. 878 


23 


6.643 


28 


5.872 


25 


6.222 


103 


15 


- 


- 


- 


4 


6.084 


31 


6. 514 


29 


,5.521 


14 


5.063 


24 


5. 795 


98 


16 


- 


- 


- 


- 


6.687 


31 


5.534 


32 


6.220 


30 


4.650 


24 


5.773 


117 


Average Grace Placement/Total N 


6.611 


460 


6. 195 426 


5.786 


399 


5. 007 372 


5.900 1657 
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TABLE 7 



Mean Squares and F-Ratlos for Analysis of Variance of Computation Grade Placements on 

Stanford Arithmetic Achievement Test 



Source 




df 


Mean Square 

\ 


F-Ratio 


Experimental Atmosphere (E) 




1 


.053 




Notice of Testing (N) 




1 


. 001 


mm 


Test Administrator (A) 




1 


.633 


2.37 


Test Scorer (S) 




1 


.158 




Previous Achievement (P) 




3 


7.477 


28. 00*** 


EX N 




1 


1.572 


5. 89* 


EX A 




1 


.411 


1.54 


EX S 




1 


.284 


1.06 


EXP 




3 


.135 




N X A 




1 


.014 


eu» 


NX S 




1 


.522 


1.96 


NX P 




3 


.671 


2.51 


AX S 




1 


.004 




AX F 




3 


.150 


•» 


SXP 




3 


.262 




EX NX A 




1 


.348 


1.30 


EX NX S 




1 


.148 




EX AX S 




1 


.062 




NX AX S 




1 


.567 


2.12 


Error 




31 


.267 





*p < • 05 

***p < . 001 



TABLE 8 

Average Computation Grade Placements on 
Stanford Arithmetic Achievement Test bv 
Experimental Atmosphere and Notice of Testing 



Experimental 

Atmosphere 


Notice of Testing 
+ 


+ 


6. 090 


5. 767 


- 


5.719 


6.023 



Arlthmsttc Ctncapts 

The Concepts score on the Stanford Arith- 
metic Achievement Tast r Intermediate II, Is 
computed using 32 problems. The problems 
are more verbal than those In the Computations 
subtlest. This sub-test Is concerned more 



with the concepts behind the fundamental arith- 
metic processes rather than directly with the 
processes themselves. 

The average grade placement of the 64 ex- 
perimental units was considerably higher on 
this sub-test than on the first. As shown In 
Table 9, the average Concepts grade placement 
was 6.266, over three months greater than the 
average Computation grade placement (5. 900). 
The numbers of pupils In the classes are not 
given in Table 9 (as they were In Table 6) be- 
cause Ns were identical for each of the depen- 
dent variables. 

The class means on the Concepts sub-test 
were analyzed using a complete factorial 4X2 4 
analysis of variance. The mean squares and 
F-ratlos that resulted are summarized in Table 
1 0. Previous arithmetic achievement was 
highly significant with means of 7.160, 6. 707, 
5. 982, and 5. 214 for the four strata or levels 
(see Table 9). 
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TABLE 9 



Average Concepts Grade Placements on Stanford Arithmetic Achievement T««t 
by Experimental Unit, Treatment, and Stratum; Testing Conducted In April 1965 



Treat- 


• 


















ment 


Exp. 


Test 


Teacher 


Teacher 




Stratum 






Number Atmos. 


Notice 


Adm. 


Scored 


1 


2 


3 


4 


Average 


1 


♦ 


+ 


+ 


+ 


6, 900 


6.866 


6.258 


5. 574 


6.399 


2 


+ 


+ 


+ 


- 


8. 553 


6. 861 


5.932 


5.457 


6. 701 


3 


+ 


+ 


- 


+ 


7.888 


6.617 


5.994 


5. 723 


6.555 


4 


+ 


+ 


- 


- 


7.372 


6.800 


6.453 


5.133 


6.439 


5 


+ 


- 


+ 


+ 


6.617 


7.165 


6.102 


5.733 


6.404 


6 


+ 


- 


+ 


- 


7.158 


6. 086 


5. 643 


4.733 


5.905 


7 


+ 


— 


- 


+ . 


7.106 


6.838 


5.967 


4. 885 


6.199 


8 


+ 


- 


- 


- 


7.268 


6. 452 


5.194 


5.146 


6.015 


9 


— 


+ 


+ 


+ 


6.613 


6.213 


6.463 


5.462 


6.188 


10 


- 


+ 


+ 


- 


7.954 


7.376 


5. 447 


4.952 


6.432 


11 


- 


+ 


- 


+ 


6.430 


7.422 


5.439 


5.037 


6.082 


12 


- 


+ 


- 


- 


7.171 


6.359 


6.403 


4. 868 


6.200 


13 


■LI 


- 


+ 


+ 


7.214 


6.712 


6. 467 


5.432 


6.456 


14 


- 


- 


+ 


- 


7.137 


6.548 


5.943 


5.940 


6. 392 


15 


•M 


- 


mm 


+ 


6.639 


6.597 


5.857 


4.738 


5.958 


16 




— 


- 


- 


6.535 


6.403 


6.143 


4.608 


5.922 


Average Grade Placement 






7.160 


6.707 


5.982 


5.214 


6.266 



TABLE 10 



Mean Squares and F- Ratios for Analysis of Variance of Concepts Grade Placements on 

Stanford Ar ithmetic Achievement Te st 



Source 


df 


Mean Square 


F-Ratio 


Experimental Atmosphere (E) 


1 


.244 


1.54 


Notice of Testing (N) 


1 


.762 


4. 82* 


Test Administrator (A) 


1 


.567 


3.59 


Test Scorer {S) 


1 


.014 




Previous Achievement (P) 


3 


11.634 


73.63*** 


EX N 


1 


.489 


3. 09 


EX A 


. ’ 1 


.306 


1.94 


EX S 


1 


.145 




EXP 


3 


.174 


1.10 


NX A 


1 


. 096 


_ 


N X S 


1 


.443 


2.80 


NX P 


3 


• 066 




AX S 


1 


.010 




AX P 


3 


.096 




sxp 


3 


.440 


2.78 


EX NX A 


1 


.103 


• 


EX NX S 


1 


.041 


« 


EX AX S 


1 


.000 




NX AX S 


1 


*197 


1.25 


Error 


31 


.158 





*p < ,.'05. 

***p < , 001 



o 



22 



The source of the significant main effect 
for notice of testing was a difference of over 
two months achievement. The average grade 
placement for classes receiving notice of the 
test {+) was 6.375 while those classes receive 
lng no notice (-) averaged 6. 156 grade place- 
ment units. None of the two- or three-factor 
Interactions tested significant at the . 05 level. 



AritfeiMtic Applications 

The third and final sub-test in the Stanford 
Arithmetic Achievement Test contains 39 items 
and measures the pupil' s ability to apply math- 
ematical principles to attain problem solutions. 
This type of exercise Is commonly referred to 
as a "word problem. " In this particular test 
the pupil has to interpret graphs, compute areas, 
figure sales tax, etc. 

The means for the experimental units are 
given in Table 11. The average Applications 
grade placement, 6.608, exceeded that of both 
the other sub-tests. 

The same type of design used previously, a 
4 X 2 4 analysis of variance, was employed on 
the class means. The resulting mean squares 



and F-ratlos are tabulated in Table 12. Pre- 
vious arithmetic achievement was again highly 
significant with means of 7.790, 7.113, 6.293, 
and 5. 236 for the four strata (see Table 11). 
None of the other main effects were significant. 

The interaction of experimental atmosphere 
with notice of testing was significant at the 
• 01 level. The source of this significance was 
a high mean for the 4 4 treatment combination, 
a moderately high mean for the — - treatment 
combination and low means for the 4 - and - + 
treatments. The means are given in Table 13. 

Another significant interaction occurred be- 
tween notice of testing and test scorers. The 
means of the observations under r +, + -, and 
- 4 treatment conditions were generally com- 
parable as can be seen in Table 14. However, 
the mean grade placement for the - - cell (that 
is, no notice and outside scored) is appreciably 
lower than the other three. 

The final significant two-factor Interaction 
Involved test scorer and previous arithmetic 
achievement, the stratifying variable. As can 
benotedinTablelSa marked crossover occurs. 
Tests of pupils in strata one and two that were 
scored by outsiders had higher means than the 
teacher- scored tests, while the opposite situ- 



TABLE 11 



Average Applications Grade Placements on Stanford Arithmetic Achievement 
Test by Experimental Unit, Treatment, and Stratum; Testing Conducted in April, 1965 



Treat- 
ment Exp. 
Number Atmos. 


Test 

Notice 


Teacher 

Adm. 


Teacher 

Scored 


i 


Stratum 
2 3 


4 


Average 


1 


+ 


+ 


+ 


+ 


7.694 


7.448 


6.952 


5.774 


6.967 


2 


+ 


+ 


* 


- 


9.240 


7.133 


5.700 


5.186 


6.815 


3 


+ 


+ 


- 


+ 


8.491 


7.114 


6. 594 


5.431 


6.907 


4 


+ 


+ 


- 


- 


7.575 


7.988 


6.877 


5.179 


6. 90S 


5 


+ 


- 


+ 


+ 


7.372 


7.053 


6.612 


6. 210 


6.812 


6 


+ 


- 


+ 


- 


7.964 


6.532 


5.825 


4. 367 


6.172 


7 


+ 


- 


CIV 


+ 


7.572 


6. 886 


6.313 


5.431 


6. 550 


8 


+ 


- 




- 


?.S72 


7.196 


5, 106 


5.118 


6.248 


9 


- 


+ 


+ 


+ 


7.013 


6.433 


6. 946 


5.271 


6.416 


10 


- 


+ 


+ 


- 


8.486 


7.672 


5.629 


4.507 


6.573 


11 


- 


+ 


- 


+ 


6.703 


7.456 


5.417 


5.063 


6. 160 


12 


- 


+ 


- 


- 


7.939 


7.032 


6.937 


4. 836 


6. 686 


13 


- 


- 


+ 


+ 


7.732 


6.959 


6.805 


5.930 


6. 856 


14 


- 


- 


+ 


- 


8. 522 


6.952 


6. 268 


5. 884 


6.906 


15 


- 


- 


- 


+ 


7.587 


7.272 


6.214 


5.183 


6. 564 


16 


- 


- 


- 


- 


7.171 


6. 688 


6.500 


4.408 


6.192 


Average Grade Placement 




7.790 


7.113 


6.293 


5.236 


6. 608 
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TABLE 12 

Mean Squares arid F- Ratios for' Analysis of Variance of Applications Grade Placements on 

Stanford Arithmetic Achievement Test 



Source 


df 


Mean Square 


F-Ratio 


Experimental Atmosphere (E) 


1 


.261 


1.38 


Notice of Testing (N) 


1 


.318 


1.68 


Test Administrator (A) 


1 


• 426 


2.25 


Test Scorer (S) 


1 


.135 




Previous Achievement (P) 


3 


19. 373 


102.50*** 


EX N 


1 


1.557 


8.24** 


EX A 


1 


.248 


1.31 


EX S 


1 


.532 


2.81 


EXP 


3 


.108 


— 


NX A 


1 


.291 


1.54 


NX S 


1 


. 804 


4.25* 


NX P 


3 


.183 




A X S 


1 


.047 


•B 


AX P 


3 


.285 


1.51 


sx p 


3 


1.018 


5. 39** 


EX NX A 


1 


.105 


• 


EX NX S 


1 


.012 




EX AX S 


1 


.073 




NX AX S 


1 


.091 


mm 


Error 


31 


CO 

• 





*p < . 05 
**p < , 01 
***p < . 001 



TABLE 13 

Average Applications Grade Placements on 
Stanford Arithmetic Achievement Teat bv 
Experimental Atmosphere and Notice of Tasting 



Experimental 


Notice of Testina 


Atmosphere 


+ 


+ , 


6.898 6.446 


- 


6. 459 6. 630 



TABLE 14 

Average Applications Grade Placements on 
Stanford Arithmetic Test by 

Notice of Testing and Test Scorer 



Notice of 
Testing 


TestSoorer 

♦ 


+ 


6.612 


6.745 


- 


6. 696 


6.380 
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attorn prevailed for strata three and four. It 
should be recalled further that the Ax Sx P 
interaction was not pooled in the error term 
because this investigator could not assume 
that it was an estimate of «r 2 . 

Hone of the second-order interactions were 
significant. 

TABLE 15 

Average Application Grade Placements on 
Stanford Arithmetic Achievement Test bv 
Test Scorer and Previous Arithmetic Achieve- 
ment (Stratum) 



Test 




Stratum . 




Scorer 


1 


2 


3 


4 


4 


7.520 


7. 078 


6.482 


5.537 


- 


8. 059 


7.149 


6. 1 05 


4.936 



Average Arithmetic Seers 

The fourth dependent variable, average 
arithmetic score, was formed by equally weight- 
ing the pupil 1 s grade placements on the three 
sub-tests in the S tandard Arithmetic Achieve- 
ment Test. The class means resulting from this 
procedure are found in Table 16. 

The results of the 4 X 2 4 analysis of vari- 
ance are summarized in Table 17. The only 
significant main effect was previous arithmetic 
achievement, with means for strata one through 
four of 7.187, 6.672, 6.020, and 5.152, re- 
spectively. 

None of the three-factor interactions reached 
significance. The only significant two-factor 
interaction occurred between experimental at- 
mosphere and notice of the testing. The source 
of this significance was a high mean grade 
placement for the 4 4 treatment combination 
with low mean grade placements for the other 
three combinations, although the - - cell mean 
was somewhat larger than the means of the 4 
- and - 4 cells. The precise means Involved 
in the interaction are presented in Table 18. 



TABLE 16 

Average Grade Placements on Stanford Arithmetic Achievement Test 
by Experimental Unit, Treatment, and Stratum; Testing Conducted in April 1965 



Treat- 
ment Exp. 
Number Atmos. 


Test 

Notice 


Teacher 

Adm. 


Teacher 

Scored 


1 


Stratum 
2 3 


4 


Average 


1 


4 


4 


4 


4 


6.970 


7. 001 


6.410 


5.526 


6.477 


2 


4 


4 


4 


- 


8.428 


6.838 


5. 847 


5. 350 


6.616 


3 


4 


4 


- 


4 


8.153 


6. 661 


6.015 


5.218 


6.512 


4 


4 


4 


- 


- 


7.275 


6.751 


6. 368 


5.255 


6.412 


5 


4 


- 


4 


4 


6.793 


6.980 


6.133 


5. 904 


6.452 


6 


4 


- 


4 . 


- 


6.992 


6. 058 


5. 592 


4.467 


5.777 


7 


4 


- 


- 


4 


7, 010 


6. 746 


6. 063 


5. 049 


6.217 


8 


4 


- 


- 


- 


7.135 


6. 614 


5.263 


5.033 


6. 011 


9 


- 


4 


4 


4 


6. 57 8 


6.111 


6. 552 


5.189 


6.107 


1 0 


- 


4 


4 


- 


8.036 


7.336 


5. 310 


4.606 


6.322 


11 


- 


4 


- 


4 


6.420 


7.128 


5.233 


5.009 


5.947 


12 


- 


4 


- 


- 


7.122 


6.472 


6. 247 i 


4.803 


6. 161 


13 


- 


- 


4 


4 


7. 121 


6.592 


6. 857 


5.581 


6. 538 


14 


mm 


- ■ 


4 


- 


7.385 


6.459 


6.284 


5.899 


6.507 


15 


- 


- 


- 


4 


6.770 


6.794 


5. 864 


4. 995 


6.106 


16 


- 


- 


- 


- 


6.798 


6. 208 


6.288 


4. 555 


5.962 


Average Grade Placement 




7.187 


6.672 


6. 020 


5.152 


6.258 
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TABLE 17 

Mean Square* and F-Ratlos for Analysis of Variance of Average Grade Placement* on 

Stanford Arithmetic Achievement Teat 



Source 


df 


Mean Square 


F-Ratio 


Experimental Atmosphere (E) 


1 


.170 


1.10 


Notice of Testing (N) 


1 


. 242 


1.56 


Test Administrator (A) 


1 


.538 


3.47 


Test Scorer (S) 


1 


.086 


- 


Previous Achievement (?) 


3 


12.332 


79. 56*** 


EX N 


1 


1.137 


7.34* 


EX A 


1 


.318 


2.05 


EX S 


1 


. 300 


1.94 


EXP 


3 


.129 


- 


NX A 


1 


. 060 


- 


NX S 


1 


. 580 


3.74 


NX P 


3 


.104 


1.19 


AX S 


1 


.003 


- 


AX P 


3 


.073 


- 


SXP 


3 


. 448 


2.89 


EX NX A 


1 


. 169 


1. 09 


EX NX S 


1 


.025 


- 


EX AX S 


1 


.030 


- 


NX AX S 


1 


.089 


- 


Error 


31 


.155 





*p < .05 

***p < , 001 



TABLE 18 



Average Grade Placements on Stanford 
Arithmetic Achievement Test by Experimental 
Atmosphere and Notice of Testing 



Experimental 


Notice of Teetino 


Atmosphere 


+ 


+ 


6. 504 


6.115 


- 


6.134 


6.278 



Two final tables are presented In this chap- 
ter. In Table 19, the average grade placements 
on the Stanford Arit hmetic Achievement Test 
for each of the four independent variables are 
reported; the table will be referred to in Chapter 
V. 

Finally, certain facts can be noted about 



the accuracy of the teacher-scorers (in the 
odd-number treatments). In the first place, 
teacher errors were as L^ely to raise grade 
placements as lower them; this was also true 
for the outside scorers. Second, teachers 1 
percent error rates were comparable to those 
of the outside scorers. The percent error rates 
for the teachers are listed in Table 20 by in- 
dependent variable and stratum. The similarity 
of the means for + and - notice and also for + 
and - experimental atmosphere is not found for 
+ and - test administrator or for strata. (If a 
scorer had recorded an incorrect grade place- 
ment for a sub-test, he was given an error. 
On each test scored, therefore, a maximum of 
three errors could be made. The percent error 
rate was determined for each scorer by dividing 
his total number of etrors by three times the 
number of tests that he had scored. ) 

In the next chapter, the implications of the 
results reported in this chapter are discussed. 
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TABLE 19 



Average Computation, Concept*, Application*, and Total Grade Placement* on the 
.Stanford Arithmetic Achievement Teat by Independent Variable: Testing Conducted in April, 1965 



Independent 

Variable 




Computation 


Arithmetic Sub-Test 
Concepts 


Applications 


Average 


Experimental 


+ 


5, 929 


6. 327 


6.672 


6.309 


Atmosphere 




5.871 


6.204 


6. 544 


6.206 


Notice of 


+ 


5.905 


6. 375 


6.679 


6.319 


Testing 


* 


5. 895 


6.156 


6.538 


6.196 


Teacher 


+ 


5. 999 


6. 360 


6. 690 


6.350 


Administration 




5.800 


6.171 


6. 527 


6. 166 


Teacher 


+ 


5. 949 


6.280 


6.654 


6.295 


Scoring 




5. 850 


6. 251 


6.562 


6.221 



TABLE 20 

Percent Error Rates of Teacher-Scorers by Independent Variable and Previous 

Arithmetic Achievement (Stratum) 



Independent Variable 



Percent Error Rate 



Experimental Atmosphere 



4.62 

4.70 



Notice of Testing 



+ 



4.42 

4.76 



Teacher Administration 



4.16 
4. 91 



Stratum 



1 

2 

3 

4 



5. 09 
5.76 
4.12 
3.54 
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DISCUSSION AND CONCLUSIONS 



In this chapter, consideration is given to 
the results found In the experiment and their 
implications. Where appropriate, the discus- 
sion will include relationships between this 
study and the. articles and investigations de- 
scribed in Chapter II. 

To lend structure to the chapter, the follow- 
ing organizational scheme will be employed. 
First, each of the main effects associated 
with the four independent variables will be 
considered. Guidelines for educational re- 
searchers will be presented as they relate to 
each of the independent variables. Then the 
function of the leveling variable, previous 
arithmetic achievement, will be examined 
briefly. Once the five single variables have 
been considered, attention will be focused on 
the significant Interactions and other Inter- 
actions of Interest. Any discussion that fol- 
lows the stating of near significant differences 
and contains conjectures as to the possible 
causes of the difference should in no wav be 
construed as having made the difference a true 
one or more significant than Initially reported. 

The discussion will next center on some 
general observations made by this investigator 
during the course of the experiment. Last, con- 
clusions will be stated In terms of the hy- 
potheses In Chapter I. 

. * 

EXPERIMENTAL ATMOSPHERE 

The entries In Table 19 consistently favor 
the HI* experimental atmosphere treatment. Ex- 
perimental units under the 4 treatment scored 
.058, .123, and .128 grade placement units 
above the - classes, an average superiority of 
.103 grade placement units or about one monthfc 
achievement. These differences, although 
large enough to be considered of practical im- 
portance by many school administrators, were 
associated with F-ratios having an average 
significance of only .25. Obviously, one 
would incur a high risk of committing a Type I 
error if he were to conclude that the differences 



found were due to other than chance factors. 

Yet the literature cited in Chapter II all 
seems to indicate that experimental atmosphere 
is a potent variable. The studies giving rise 
to the term "Hawthorne Effect" (Mayo, 1945; 
Roethllsberger, 1941; and Roethllsberger and 
Dickson, 1941) and the work of Orne (1962) 
and Rosenthal (1963, 1965) all suggest a pro- 
nounced effect due to merely being involved in 
an experiment. However, a crucial difference 
between the present study and those mentioned 
above is that in this study the full burden of 
conveying or administering the experimental 
atmo sphere condition fell upon a very short, and 
in some respects innocuous, paragraph of in- 
structions. This one-shot treatment is notice- 
ably different from the dally and frequent inter- 
action between experimenter and subject such 
as in the Hawthorne Western Electric plant. 
Several steps could have been taken to increase 
the teacher 1 s feeling of experimental involve- 
ment, but it must be remembered that only 16 of 
the 32 teachers under this condition had notice 
of the test date. Stimulation of the teachers 
under the 4 experimental atmosphere, - notice 
treatment condition, might have led to the con- 
tamination of the notice variable or to this re- 
searcher taking such license with the truth that 
even the most liberal school administrator would 
not permit it. 

It must be noted, however, that recent liter- 
ature on the effect of experimental atmosphere 
on educational research does not exist. The 
obvious possibility should not be overlooked 
that experimental atmosphere may, in actuality, 
have no effect on teachers in many situations. 
Such a possibility is in no way refuted by the 
low statistical significance of the differences 
found in this study favoring 4 experimental 
atmosphere. Until additional evidence is avail- 
able, the classroom researcher would do well 
to adopt one position or the other, 1. e. , either 
tell all the experimental subjects that they are 
in an experiment or tell none of them. The lat- 
ter course of action might still permit consider- 
able variability due to Orne's demand character- 
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lstlcs <1962); thus In many situations it would 
be undesirable. 



NOTICE Of TESTING 

The differential effect of 10 school days 
notice of the upcoming test as compared with 
notice of a single school day was Investigated. 
The resulting grade placements favored the + 
notice condition, with differences amounting 
to • 010, .219, and « 141 grade placement units 
for the three arithmetic sub-tests (see Table 
19), Corresponding to the gap of over two 
months on the Concepts sub-test was an F- 
ratio significant beyond the . 05 level, while 
the F-ratlo for the Applications sub-test 
(F* 1.68) was significant at the .20 level. 

The obvious discrepancy in need of resolu- 
tion is the difference between the apparent lack 
of effect of test notice upon the Computation 
sub-test (with a mean square of . 001 ) and the 
significant effect due to test notice on the Con- 
cepts sub-test. Examination of the test items 
involved suggests a plausible explanation. 
The computation items are routine, drill-type 
problems. Students have attained their current 
abilities on this type of problem over several 
years of dally practice. An Inordinate emphasis 
practicing similar items would be necessary to 
bring about any appreciable gain in the students' 
performance on the task. 

However, the problems In the Concepts sub- 
test are more of the "aha" variety, that is, ex- 
tremely puzzling when first encountered but re- 
markably routine after even a short discussion 
of the concepts Involved, such as "place value." 
Other problems In this sub-test could become 
quite simple and routine with a minimum of in- 
struction. Teachers who received notice of the 
testing quite probably read over the test Items. 
With no conscious motivation to aid their pupils 
on the tests, they may have been attracted by 
some of the concept (and application) problems 
and subsequently may have discussed that type 
of problem with their classes. Pupils in no- 
notice classes would not have had a similar 
opportunity to learn of the concepts Involved 
In the test items. 

Regardless of the particular cause of the 
significant effect, the educational researcher 
would do well to insure that the experimental 
subjects and/or their teachers all receive the 
same notice of any upcoming test (especially 
one that has some degree of novelty associated 
with it), or that all concerned receive no notice 
at all. The latter procedure, although more 
difficult to Implement, might reduce other 



sources of variability as the discussions of 
interactions later in this chapter will imply. 



ADMINISTRATION OR THE TEST 

In 32 classes, teachers administered the 
test to their pupils (4), while in the other 32, 
outside test administrators gave the test with 
the teacher present in the room (-). For all 
three sub-tests, the + treatment classes tested 
higher than their - treatment counterparts. 
The differences in means on the three sub-tests 
were .199, .189, and .163 grade placement 
units with an average difference of . 184 units 
or two months' achievement (Table 19). As- 
sociated with these differences are F-ratios of 
2, 37, 3. 59, 2. 25, and 3. 47. The F-ratios for 
the Concepts sub-test and the average grade 
placement are significant at the . 07 and . 08 
levels of significance, and overall the average 
F-ratio for this main effect is approximately 
. 10. Although this Is considerably below the 
. 05 level of significance used heretofore, the 
consistency of the effect due to the test ad- 
ministrator variable across all the sub-tests 
lends support to its claim as due to a true, 
rather than a chance, difference. 

In Chapter II, three references were cited 
that gave essentially the same explanation as 
to why pupils score better on standardized 
tests administered by their own teachers. 
Rice (1897), Lowell (1919), and Traxler (1951) 
suggested that indirect hints given by the 
teacher during the test might aid pupils to ob- 
tain higher scores. Although this researcher 
has no way of knowing, it would seem that the 
unspoken rapport between teacher and pupil Is 
an equally Important consideration. Most pu- 
pils, especially In the lower grades, are some- 
what anxious about taking a test, and the anx- 
iety of many of them is undoubtedly Increased 
when a stranger administers the test. In some 
cases this anxiety reaches a level that impairs 
the pupil' s performance. 

The reseacher who Investigates performance 
In the schools must take this variable into ac- 
count. The least desirable situation would In- 
volve mixing the mode of test adminl station, 
1. e. , having outsiders test some classes and 
letting some teachers test their own pupils. 
The best solution would be to use well-trained 
outsiders to test all classes, thereby testing 
all classes under nearly identical conditions. 
Extensions of this, TV administration of tests 
(Hopkins and Lefever, 1964) or administration 
by phonograph record, offer great promise and 
reduce the uncontrolled variance Inherent in 
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using several outside test administrators. A 
compromise alternative, between the two pro- 
cedures outlined, would be to let each teacher 
administer the test to his own pupils. If this 
final practice is adhered to, however, the inves- 
tigator must expect considerable uncontrolled 
variation due to teachers' varying degrees of 
rapport with their pupils and other factors. 
Extensive training of the teachers in the proper 
procedures to follow when giving the test would 
reduce in Intensity, but not eliminate, the un- 
desirable variability Inherent in testing pro- 
grams utilizing teacher administrators. 

SCORING OF THE TEST 

F!rom Table 1 9, it can be seen that differ- 
ences between 4* and - scoring treatments are 
relatively small: . 099, .031, and . 092 grade 
placement units, an average difference of ap- 
proximately three-quarters of a month, favoring 
the treatments in which the teacher scored t v e 
tests. In no instance did any associated I- 
ratio exceed 1. 

The question as to possible differential ef- 
fects due to the actual scoring of tests by 
teachers and outsiders is evidently not a crit- 
ical one. It would seem that substantially 
more important are the directions for scoring 
and the concreteness and definiteness of the 
task given the scorer. The scoring key used 
with this test served to "standardize" the 
scoring procedures used. 

A brief analysis of the percent error rates 
of the teacher- scorers demonstrated that their 
average error rate was comparable to that of 
the outside scorers. Although no statistical 
analysis was performed, the teacher-error rates 
were presented by Independent variable in 
Table 20 for the reader's Information. The er- 
rors made by the teachers were as likely to 
raise as lower grade placements, supporting 
an earlier finding of Phillips and Weathers 
(1958). 

What Implications can the educational re- 
searcher draw from these findings ? The com- 
parability of results when using teacher and 
outside scorers would suggest that either or 
both could be used to process test data. The 
matter is not this simple, however. Individuals, 
in this case both teachers and outsiders, very 
widely in their percent error rates (the outside 
scorers varied from 2. 06 to 7. 88%; the teachers 
varied from 0 to 18.10%). Few researchers 
feel secure reporting results that are based on 
"error-ridden" data, even if the errors are ran- 
dom. The varying competencies of scorers 
Increase the uncontrolled variance in the de- 
sign. Multiple rescoring is also unsatisfactory: 



it is time consuming, and there are dangers 
inherent in any situation where many people 
handle the data. 

Probably the best procedure to follow in re- 
gards to scoring a test given to evaluate a re- 
search project is to use a machine- sewing 
answer sheet and to have each of a limited 
number of persons of known competence (1. e. , 
low percent error rates ) prepare a random selec- 
tion of the tests for machine-grading. If the 
test has no machine-scoring answer sheet or 
if the test is subjective in nature, then more 
elaborate preparations must be made, such as 
exact specification of scoring procedures, 
training of scorers, blind scoring, etc. How- 
ever, even in this latter case if is still un- 
doubtedly wise to use only a few highly com- 
petent scorer% thereby reducing error variances 
resulting from inaccurate scoring. 



PREVIOUS ARITHMETIC ACHIEVEMENT 

The leveling variable, previous arithmetic 
achievement, was highly significant for all 
dependent variables. This variable alone ac- 
counted for a large proportion of the variance 
in the experimental observations. Although 
some overlapping occurred (that is, some 
stratum two schools out-achieved stratum one 
schools, etc. ), this was minimal and not un- 
expected, and the means for each of the four 
strata were widely disparate. 

SIGNIFICANT INTERACTIONS AND INTERACTIONS OF 

INTEREST 

In this section, the three first-order inter- 
actions that were significant will be discussed 
as well as these same three two-factor inter- 
actions for all of the dependent variables. Al- 
though some of the means associated with 
these significant interactions were reported in 
earlier tables, they will be repeated in table 
form here for the reader's convenience and to 
permit side-by-side comparison of the means 
for the Interaction on all four dependent vari- 
ables. In addition, a brief discussion of the 
A X S X P interaction will be Included. 

The means associated with the interaction 
EXN are reported in Table 21. Of all the in- 
teractions, this one was apparently most con- 
sistently significant across the four dependent 
variables. The interaction was primarily sig- 
nificant because of the relative effectiveness 
of the + + treatment combination in comparison 
with the + -, - +, and - - cells. Therefore, 
the primary importance of this significant 
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TABLE 21 

Average Grade Placements on Stanford Arithmetic Achievement Tail 
by Experimental 1 . Atmosphere — Test Notice Treatment Combination and Sub-Test 



E X N Treatment 


* 


Sub-Test 






Combination 


Computation 


Concepts 


Applications 


Average 


+ + 


6.090 


6. 524 


6.898 


6.504 


^ ** 


5.767 


6.131 


6.446 


6.115 


- + 


5.719 


6.226 


6.459 


6.134 


- “ 


6.023 


6.182 


6.630 


6.278 


F-Ratio 


5.89 


3. 09 


8. 24 


7.34 


Significance (p < ) 


.03 


.09 


.01 


<M 

O 

• 


Interaction is the highlighting of the effective- 
ness of 4 experimental atmosphere in combina- 
tion with 4 notice of testing. In addition, 
note that the •• - treatment combination pro- 
duced a grade placement almost as high as the 
+ 4 cell on the Computations i*ub-test but not 
on the other sub-tests. This fact lends sup- 
port to the contention (made in the discussion 
above of "notice of testing") that the pupils' 
computational ability Is the product of many 
years' training and Is relatively unaffected by 
any short-duration treatment. 

The cell means generated by the NX S in- 
teraction are presented In Table 22. Although 
the F-values reached the • 05 level of slgnlfl- 


cance only on the Applications sub-test, they 
are of considerable magnitude and deserve some 
attention. The source of the significance Is 
the relatively low grade placements of the - - 
treatment combination: classes that received 
no notice of the test and whose tests were 
scored by outsiders. The similarity of the 4 + 
and 4 - grade placements across sub-tests is 
striking, indicating that no differential scoring 
patterns were manifest between teacher and 
outside scorers when the teacher had received 
notice of the test. However, the - 4 treatment 
combination produced grade placements con- 
sistently two to three months greater than the 
- - combination and this difference undoubtedly 




TABLE 22 






Averaae Grade Placements on Stanford Arithmetic Achievement Test < 
by Test Notice — Test Sooror Treatment Combination and Sub- Test 




NXS Treatment 




Sub-Test 






Combination 


Computation . 


Concepts 


Applications 


Average 


+ 4 


5.864 


6.306 


6.612 


6.261 


+ - 


5. 945 


6.443 


6.745 


6.373 


- + 


6.035 


6.254 


6. 696 


6.328 


!. 


5. 755 


6. 059 


6.380 


6.064 


F-Ratio 


1.96 


2.80 


4.25 


3.74 


Significance (p < ) 


.19 


.11 


.05 


.07 



o 

ERIC 



31 



was the source of the significant interaction. 

Possibly the teachers in the - + cell adopted 
different scoring standards than the outside 
scorers because of the lack of notice afforded 
their pupils. Regardless, the consistency of 
this rather large difference between the - 4 and 
- - cells is difficult to explain and warrants 
further investigation. It is well to remember, 
however, that this interaction is significant at 
the . 05 level for only one of the sub-tests, as 
mentioned above, and may be of spurious sig- 
nificance, although this appears unlikely. 

The average grade placements for the final 
significant two-factor Interaction, 8X P, are 
recorded in Table 23. Tho source of this sig- 
nificance is the higher grade placements for 
outside scorers in stratum 1 and the reversal 
of this situation for strata 2, 3, and 4, (1. e. , 
higher grade placements for teacher-scorers). 
It suggests that the teachers in the upper stra- 
tum schools put more emphasis on motivating 
and/or preparing their pupils for the test when 
they (the teachers ) knew that it would be scored 
by outsiders than when they were to score it 
themselves. On the other hand, teachers in the 
lower strata did not Increase their preparation 
and/or motivation efforts when they knew the 
test would be scored by outsiders. Indeed, the 
average differences on the three sub-tests fa- 
vored the teacher- scorers by . 16C, .241, and 
.313 grade placement units for strata 2, 3, and 
4 respectively. Only the fact that the opposite 
situation was true in stratum 1 (where the mean 



of the tests scored by outsiders exceeded that 
of the tests scored by teachers by .419 grade 
placement units) kept the main effect of test 
scorer from reaching significance. 

The two-factor interaction continued to be 
significant when paired with the effect due to 
test administrator. As discussed in Chapter 
IV, the sum of squares for the A x S x P inter- 
action was not pooled in the error term because 
it was quite large. Indeed, Inspecting means 
for the third and fourth strata for the four pos- 
sible combinations of the test-administrator 
and test-scorer variables (see Table 24), one 
finds all differences ranging from three months 
to a full year in grade placement favoring the 
4 4 treatment comninatlon over the 4 - cell. 
The reverse situation is true for stratum 1, with 
the 4 -cellmeans surpassing the 4 4 means by 
over one-half year's grade placement on all 
three sub-tests. Differences between the - 4 
and- - treatment combinations are appreciably 
smaller for all strata. The - 4 and - - grade 
placements are generally lower than the cor- 
responding 4 4and4 - means; this is to be ex- 
pected considering the significant main effect 
favoring teacher administration of the test. 
The sources of both interactions can be seen 
and the above discussion clarified by studying 
Figures 6 and 7, in which the relevant interac- 
tions on the Applications sub-test are graphed 
(the interactions are also in evidence on the 
other two sub-tests, but are not as prominent 
as those on the Applications sub-test when 
graphed). 



TABLE 23 

Average Grade Placements on Stanford Arithmetic Achievement Test 
by Test Scorer — Previous Arithmetic Achievement (Stratum) Combination and Sub-Test 







Sub-Test 






Stratum 


Commutation 
TS OS 


Concents 


Application 
TS OS 


Average 


TS OS 


TS OS 


1 


6.484 6.737 


6.926 7.394 


7. 520 8. 059 


6.977 7.396 


2 


6.374 6.016 


6.804 6.611 


7.078 7.149 


6.752 6.592 


3 


5.872 5.699 


6.068 5.895 


6.482 6.105 


6.141 5.900 


4 


5. 067 4. 948 


5.323 5.105 


5.537 4.936 


5.309 4.996 


F-Ratio 


.98 


2.78 


5.39 


2.89 


Sign, (p < ) 


.43 


.07 


.01 


.06 




TS ■ Teacher Scored 
OS • Outside Scored 
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TABLE 24 

Average Grade Placements on Stanford Arithmetic Achievement Test by Test Administrator * 
Test Scorer — Previous Arithmetic Achievement (Stratum) Combination and Sub-Test 



Test Admini strator — Test Scorer Treatment 



Sub-test 

b 


Stratum 


+ + 


+ - 


- + 


■■ mm 


Sign. * 


Computation 


1 


6.308 


6.877 


6.661 


6.596 






2 


6.302 


6.228 


6.446 


5.804 


P * 2.16 




3 


6.313 


5.678 


5.432 


5.720 


p < . 12 




4 


5.303 


4. 984 


4.830 


4.911 


Concepts 


1 


6.836 


7.700 


7.016 


7.086 






2 


6.739 


6.718 


6. 868 


6.503 


F = 2.99 




3 


6. 322 


5.741 


5.814 


6.048 


p < • 05 




4 


5.550 


5. 270 


5.096 


4.939 


Applications 


1 


7.453 


8. 553 


7.588 


7. 564 






2 


6.973 


7.072 


7.182 


7.226 


F = 4. 97 




3 


6.829 


5.855 


6.134 


6.355 


p < . 01 




4 


5.796 


4.986 


5.277 


4.885 


Average 


1 


6.865 


7.710 


7.088 


7.082 






2 


6. 671 


6.673 


6.832 


6.511 


F = 4.04 




3 


6. 488 


5.758 


5.794 


6. 041 


p < . 02 




4 


5. 550 


5.080 


5.068 


4.911 





spooled error term used to compute F-ratios. 
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Figure 7. Graph of te«t administrator by test scorer by previous arithmetic achievement inter- 
action on applications subtest* 



This second-order Interaction illustrates 
that the significant SX P interaction was almost 
entirely generated in classes in which tne 
teacher administered the test. In addition^ the 
AXSXP interaction indicates that stratum one 
teachers who administer the test preparatory to 
its being scored by outsiders produce pupil 
. achievement scores notably above teachers in 
the same strata who both administer and score 
the test themselves. At the other extreme were 
the observations in strata three and four. It 
seems almost as if the teachers in stratum one 
who administered the test took procedural and 
motivational steps before and/or during their 
administration of the test to Insure continued 
high achievement by their pupils even when the 
tests were scored by outsiders. On the other 
hand, teachers in the lower achieving strata, 
three and four, evidently did not engage in simi- 
lar behaviors, or, if they did, these teacher be- 



haviors had a minimal or even negative effect 
on the achievement scores of the pupils con- 
cerned. 



GENERAL OBSERVATIONS ON THE EXPERINENT 

The initial observation when looking at the 
results of the experiment is that the pupils did 
not achieve as well on the Stanford Arithmetic 
Achievement Test a s might have been expected. 
The average grade placement on the Iowa Test 
of Basic Skills f administered in October, 1964, 
was 6. 164 grade placement units. The April, 
196 5, testing with due Stanford might reasonably 
have been expected to produce a mean grade 
placement of 6.7 or 6.8. Instead, the average* 
placement was 6. 258. The discrepancy could 
be due to any of a multitude of reasons or to a 
combination of them: the subject-matter conr 
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tent and curricular objectives of the school 
system might be more consonant with the Iowa 
than the Stanford , or the Stanford may simply 
be harder, or the recency of the normlng of the 
Stanford as compared with the Iowa may b*> a 
factor, etc. 

Turning to observations of the experimental 
procedure Itself, this investigator has every 
intention of being a detached, impartial critic, 
although he realizes that this is quite impos- 
sible. The observations concern three proce- 
dural steps that would be taken to increase the 
exactness and meaningfulness of the study 
were it to be conducted again. 

First, an attempt would be made to make 
the 4 experimental atmosphere treatment more 
realistic or, in modern parlance, to "beef-it- 
up. " This might be accomplished by additional 
letters to the teachers involved. The school 
system officials quite naturally did not want 
to deliberately mislead the teachers, and this 
restriction obviously placed an upper bound on 
the Ingenuity that one might display in con- 
cocting highly-charged situations to stimulate 
the teachers. Possibly visitations to the 
teachers would have assisted in increasing the 
potency of the + experimental atmosphere 
treatment, yet the inherent dangers in such an 
approach is Implied in the findings of Rosenthal 
(196 3 land Sarason and Minard (1963) and must 
be considered if such an undertaking is con- 
templated. 

A second methodological variation that 
would definitely increase the precision of the 
experiment would be to commence the treatments 
as soon as possible after the stratifying or 
leveling variable is available. Stratifying 
by previous arithmetic achievement allowed 
identification of a large proportion of the vari- 
ance in the observations. Had the experiment 
been run in November, 1964, even more of the 
variance would have been controlled. As it was, 
the differential learning progress made by the 
64 classes between October and April served to 
increase the uncontrolled variance in the de- 
sign. 

Finally, every effort would be made to in- 
crease the statistical power of the design, pos- 
sibly by Including additional classes for use 
as experimental units. The statistical power of 
the analysis would probably be enhanced by 
having two classes per cell, thereby permitting 
a presumably smaller within-cell error term. In 
the initial planning for this study, the school 
system administrators were understandably re- 
luctant to Involve 64 classes in a research in- 
vestigation. At that time, consideration was 
given to the possibility of using a fractional, 



rather than a complete, factorial design, there- 
by confounding some of the main effects with 
the higher order Interactions. Had the school 
system officials not graciously permitted the 
inclusion of 64 classes, a fractional factorial 
may have been run. However, the resulting 
higher order interactions, especially AXSXP 
and even some of the four-factor Interactions, 
were quite large (relative to the "usual" or 
"average" third-order Interaction) and much in- 
formation would have been obscured or com- 
pletely unavailable because of the confounding 
inherent in a fractional factorial design. Were 
the study to be run again with 64 experimental 
units, omission of the test scorer variable would 
be feasible, allowing a 4 X 2 s design with two 
classrooms (observations) per cell. 

CONCLUSIONS 

In this final section, the hypotheses formu- 
lated in Chapter I will be restated and then the 
conclusions reached on the basis of the results 
of this experiment will be succinctly stated. 
The statistically significant interactions will 
also be stated in the form of hypotheses. 

1. There is no significant difference in test 
performance between pupils whose teachers 
believe an experiment is in progress and 
pupils whose teachers do not so believe. 

The findings of this study failed to reject this 
hypothesis. The .discussion on this variable 
earlier in this chapter considered the implica- 
tions of the information that was collected. 

2. There is no significant difference in test 
performance between pupils whose teachers 
receive notice of the test date and pupils 
whose teachers do not receive notice. 

This hypothesis was rejected at the • 05 level 
in the case of a sub-test involving questions 
with a novel quality. At the same time, it must 
be noted that this Investigator failed to reject 
the hypothesis for two sub-tests containing 
relatively "common-place" items. 

3. There is no significant difference in test 
performance between pupils whose; regular 
teachers administer the test and pupils who 
are tested by an "outside" administrator. 

This hypothesis was rejected at approximately 
the . 10 level of significance. Although incur- 
ring a substantially Increased risk of a Type I 
error, the consistently substantial F-ratios for 
this effect across all dependent variables led 
to this conclusion. 

4. There is no significant difference in test 
performance between pupils whose regular 
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teachers score the test and pupils whose 
teachers do not. 

The experiment failed to reject this hypothesis. 

As a result of testing the Interactions gen- 
erated by the variables under investigation, 
the effects of the following variable combina- 
tions were considered significant: 

1. The combination of experimental atmos- 
phere with notice of testing produced sig- 
nificantly higher grade placements (average 
p < . 05) on all sub-tests except the one 
containing fundamental, drill-type Items. 

2. The combination of no notice of testing with 
outside scoring prod uc e d significantly 
lower grade placements (p < . 05) on one 



sub-test, and grade placements low enough 
to be associated with considerably large, 
although non-significant, F-ratlos on the 
other sub-tests. 

3. The combination of previous arithmetic 
achievement with teacher scoring resulted 
in a significant crossover (p <. 05) on two 
of the three sub-tests: outside scorers 
produced higher grade placements than 
teacher-scorerslnhlgh achieving classes, 
and teacher-scorers produced higher grade 
placements than outside scorers in low 
achieving classes. The significant A X S 
X P interaction demonstrated that the grade 
placements producing the S X P interaction 
were located almost entirely in classes in 
which the teacher administered the test. 
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appendix a 

EXPERIMENTAL TREATMENTS: INSTRUCTIONS TO TEACHERS 



TREATMENT 1: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade teacher at _ School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
©resent achievement status. It is our plan to 
test the pupils in selected classes, including 
yours, at a later date. The persons involved 
in this experiment will appreciate your help and 
■diligence in collecting this important informa- 
tion. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning f Aprils, 1965?The 
test should be given in two sittings, with at 
least a 1 5-minute break between sittings. Pos- 
sibly the students® recess period can be util- 
ized for this break, but would you please ar- 
range to have the test completed during the 
morning? 

Would you please administer the test to your 
pupils ? The manual and the necessary tests 
are enclosed. Look 1 over the instructions for 
administration closely. You should read over 
the instructions twice, noting especially those 
parts underlined in red. This should take 
about one hour, and you will be, paid $3.75 for 
this preparation time. 

Would you also please score the tests for 
your pupils using the enclosed scoring key? 
The key will allow you to work rapidly and ac- 
curately. . On the front of each pupil's test 
booklet, mark his three grade scores obtained 
from the bottom of test pages 3, 5, and 8. Do 
not fill in the percentile ranks. Please assure 



yourself that the marks are accurate. 

This should take you about two hours, and 
you will be paid $7, 50 for your time spent 
scoring the tests. Would you please have tine 
tests scored by April 8 ? Replace the tests in 
the envelope, seal it with scotch tape, and 
leave it in the principal' s office by 4 p. m. on 
April 8 so that the tests may be collected. 
Thank you. 



TREATMENT 2: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade teacher at School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
present achievement status. It is our plan to 
test the pupils in selected classes, Including 
yours, at a later date. The persons involved 
in this experiment will appreciate your help and 
diligence in collecting this important informa- 
tion. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning? 

Would you please administer the test to your 
pupils ? The manual and the necessary tests 
are enclosed. Look over the instructions for 
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administration closely. You should read over 
the Instructions twice, noting especially those 
parts underlined in red. This should take about 
one hour, and you will be paid $3. 75 for this 
preparation time. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal's of- 
fice by noon on April 5 so that the tests may be 
collected. You have no responsibility to score 
the tests. Thank you. 



TREATMENT 3: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade teacher at School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with pres- 
ent achievement status. It is our plan to test 
the pupils in selected classes, including yours, 
at a later date. The persons involved in this 
experiment will appreciate your help and dili- 
gence in collecting this important information. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning. April 5, 1965? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, stu- 
dent will be prepared to administer the test to 
your pupils. He will bring the tests with him. 
Would you please remain in the classroom dur- 
ing the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
alter first checking in with your building prin- 
cipal. 

Would you pleas i score the tests for your 
pupils using the enclosed scoring key? The 
key will allow you to work rapidly and accurate - 
ly. On the front of each pupil" s test booklet, 
mark his three grade scores obtained from the 
bottom of test pages 3, 5, and 8. Do not fill 



in the percentile ranks. Please assure yourself 
that the marks are accurate. 

This should take you about two hours, and 
•you will be paid $7.50 for your time spent 
scoring the tests. Would you please have the 
tests scored by April 8 ? Replace the tests in 
the envelope, seal it with scotch tape, and 
leave It in the principal's office by 4 P. M. on 
April 8 so that the tests may be collected. 
Thank you. 



TREATMENT 4: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade .teacher at School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he er+ers junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
present achievement status. It is our plan to 
test the pupils in selected classes, including 
yours, at a later date. The persons Involved 
in this experiment will appreciate your help and 
diligence in collecting this important informa- 
tion. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, stu- 
dent will be prepared to administer the test to 
your pupils. He will bring the tests with him. 
Would you please remain in the classroom dur- 
ing the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
after first checking in with your building prin- 
cipal. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal's of- 
fice by noon on April 5 so that the tests may 
be collected. You have no responsibility to 
shore the tests. Thank you. 
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TREATMENT 5: 

Instructions received by teachers on March 22. 
1965. 

To: 

6th grade teacher at, School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
present achievement status. It is our plan to 
test the pupils in selected classes, including 
yours, at a later date. The persons involved 
in this experiment will appreciate your help and 
diligence in collecting this important informa- 
tion. Additional materials and instructions 
will be forthcoming. 



Instructions received by teachers on April 2, 
1965. 

To: 

6th grade teacher at School 

As you were informed 10 days ago, the 

.Public Schools are conducting a 

study to obtain an accurate estimate of the 
achievement of each sixth grade pupil before 
he enters junior high school. This study might 
be considered an educational experiment. The 
past subject-matter achievement of pupils in 
each of a number of previous classes has been 
determined and will be compared with present 
achievement status. It is our plan to test the 
pupils in selected classes, including yours. 
The persons involved in this experiment will 
appreciate your help and diligence in collecting 
this important information. It is now possible 
to relate the details of this experiment. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

Would you please administer the test to your 
pupils ? The manual and the necessary tests 
are enclosed. Look over the instructions for 



administration closely. You should read over 
the instructions twice, noting especially those 
parts underlined in red. This should take 
about one hour, and you will be paid $3. 75 for 
this preparation time. 

Would you also please score the tests fee 
your pupils using the enclosed scoring key ? 
The key will allow you to work rapidly and ac- 
curately. On the front of each pupil's test 
booklet, mark his three grade scores obtained 
from the bottom of test pages 3, 5, and 8. Do 
not fill in the percentile ranks. Please assure 
yourself that the marks are accurate. 

This should take you about two hours, and 
you will be paid $7.50 for your time spent 
scoring the tests. Would you please have the 
tests scored by April 8 ? Replace the tests in 
the envelope, seal it with scotch tape, and 
leave it in the principal's office by 4 p. m. on 
April 8 so that the tests may be collected. 
Thank you. 



TREATMENT 6: 

Instructions received by teachers on March 22. 
1965. 

To: 

6th grade teacher at School 

The - Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fo.a he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with pres- 
ent achievement status. It is our plan to test 
the pupils in selected classes, including yours, 
at a later date. The persons involved in this 
experiment will appreciate your help and dili- 
gence in collecting this important information. 
Additional materials and instructions will be 
forthcoming. 



Instructions received by teachers on April 2, 
1965. 

To: 

6th grade teacher at School 

* 

As you were informed 10 days ago, the 
- Public Schools are conducting a 
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study to obtain an accurate estimate of the 
achievement of each sixth grade pupil before 
he enters junior high school. This study might 
be considered an educational experiment. The 
past subject-matter achievement of pupils in 
each of a number of previous classes has been 
determined and will be compared with present 
achievement status. It is our plan to test the 
pupils in selected classes, Including yours. 
The persons involved in this experiment will 
appreciate your help and diligence in collecting 
this Important Information. It is now possible 
to relate the details of this experiment. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

Would you please administer the test to your 
pupils ? The manual and the necessary tests 
are enclosed. Look over the Instructions for 
administration closely. You should read over 
the Instructions twice, noting especially those 
parts underlined in red. This should take about 
one hour, and you will be paid $3.75 for this 
preparation time. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal' s of- 
fice by noon on April 5 so that the tests may 
be collected. You have no responsibility to 
score the tests. Thank you. 



TREATMENT 7: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade teacher at School 

The Public Schools are conduct- 

ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
present achievement status. It is our plan to 
test the pupils in selected classes, including 
yours, at a later date. The persons involved 



in this experiment will appreciate your help 
and diligence in collecting this important in- 
formation. Additional materials and Instructions 
will be forthcoming. 



Instructions received by teachers on April 2, 
1965. 

To: 

6th grade teacher at School 

As you informed 10 days ago, the 

_____ Pu.ac Schools are conducting a 
study to obtain an accurate estimate of the 
achievement of each sixth grade pupil before 
he enters junior high school. This study might 
be considered an educational experiment. The 
past subject-matter achievement of pupils in 
each of a number of previous classes has been 
determined and will be compared with present 
achievement status. It is our plan to test the 
pupils in selected classes, Including yours. 
The persons involved in this experiment will 
aopreclate your help and diligence in collecting 
th*s important Information. It is now possible 
to relate the details of this experiment. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please, 
arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, 
student will be prepared to administer the test 
to your pupils. He will bring the tests with nim. 
Would you please remain in the classroom dur- 
ing the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
after first checking in with your building prin- 
cipal. 

Would you please score the tests for your 
pupils using the enclosed scoring key? The 
key will allow you to work rapidly and accur- 
ately. On the front of each pupil's test book- 
let, mark his three grade scores obtained from 
the bottom of test pages 3, 5, and 8. Do not 
fill in the percentile ranks. Please assure 
youself that the marks are accurate. 

This should take you about two hours, and 
you will be paid $7. 50 for your time spent 
scoring the tests. Would you please have the 
tests scored by April 8 ? Replace the tests in 
the envelope,, seal it with scotch tape, and 
leave it in the principal' s office by 4 P. M. on 
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April 8 so that the tests may be collected. 
Thank you. 



TREATMENT •: 

Instructions received by teachers on March 22, 
1965. 

To: 

6th grade teacher at .School 

The Public Schools are conduct- 
ing a study to obtain an accurate estimate of 
the achievement of each sixth grade pupil be- 
fore he enters junior high school. This study 
might be considered an educational experiment. 
The past subject-matter achievement of pupils 
in each of a number of previous classes has 
been determined and will be compared with 
present achievement status. It is our plan to 
test the pupils in selected classes, including 
yours, at a later date. The persons involved 
in this experiment will appreciate your help 
and diligence in collecting this Important infor- 
mation. Additional materials and instructions 
will be forthcoming. 



Instructions received by teachers on April 2, 
1965. 

To: 

6th grade teacher at School 

As you were informed 10 days ago, the 
— i Public Schools are conducting a 

study to obtain an accurate estimate of the 
achievement of each sixth grade pupil before 
he enters junior high school. This study might 
be considered an educational experiment. The 
past subject-matter achievement of pupils in 
each of a number of previous classes has been 
determined and will be compared with present 
achievement status. It is our plan to test the 
pupils in selected classes, including yours. 
The persons involved in this experiment will 
appreciate your help and diligence in collecting 
this important information. It is now possible 
to relate the details of this experiment. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 



arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, 
student will be prepared to administer the test 
to your pupils. He will bring the tests with 
him. Would you please remain in the classroom 
during the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
after first checking in with your building prin- 
cipal. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal's of- 
fice by noon on April 5 so that the tests may 
be collected. You have no responsibility to 
score the tests. Thank you. 



TREATMENTS 9 AND 03: 

Treatment 9: Instructions received by teachers 
on March 22, 1965. 

Treatment 13: Instructions recelvedby teachers 
on April 2, 1965. 

To: 

6th grade teacher at School 

The Public Schools have been 

asked to collect, in a routine manner, some 
normative information on a new standardized 
test. Your class has been randomly selected 
to take the test at a later date. Data from all 
schools will be pooled, and separate classes 
will not be identified in the process. The cen- 
tral office will not be involved in the scoring 
or recording of the tests. 

Will you please schedule time for your pu- 
pils to take the new Standard Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

Would you please administer the test to your 
pupils ? The manual and the necessary tests 
are enclosed. Look over the instructions for 
administration closely. You should read over 
the Instructions twice, noting especially those 
parts underlined in red. This should take about 
one hour, and you will be paid $3.75 for this 
preparation time. 

Would you also please score the tests for 
your pupils using the enclosed scoring key ? 
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The key will allow you to work rapidly and ac- 
curately. On the front of each pupil' s test 
booklet, mark his three grade scores obtained 
from the bottom of test pages 3, 5, and 8. Do 
not fill in the percentile ranks. Please assure 
yourself that the marks are accurate. 

This should take you about two hours, and 
you will be paid $ 7.50 for your time spent 
scoring the tests. Would you please have the 
tests scored by April 8 ? Replace the tests in 
the envelope, seal it with scotch tape, and 
leave it in the principal' s office by 4 P. M. on 
April 8 so that the tests may be collected. 
Thank you. 



TREATMENTS 10 AND 14: 

Treatment 1C: Instructions received by teachers 
on March 22, 1965. 

Treatment 14: Instructions received by teachers 
on April 2, 1965. 

To: 

6th grade teacher at School 

The Public Schools have been 

asked to collect, in a routine manner, some 
normative information on a new standardized 
test. Your class has been randomly selected 
to take the test at a later date. Data from all 
schools will be pooled, and separate classes 
will not be Identified in the process. The cen- 
tral office will not be Involved in the scoring 
or recording of the tests. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

Would you please administer the test to 
your pupils? The manual and tht> necessary 
tests are enclosed. Look over the instructions 
for administration closely. You should read 
over the instructions twice, noting especially 
those parts underlined in red. This should take 
about one hour, and you will be paid $3. 75 for 
this preparation time. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal' s of- 
fice by noon on April 5 so that the tests may 



be collected. You have no responsibility to 
score the tests. Thank you. 



TREATMENTS 10 AND 05: 

Treatment 1 1 : Instructions received by teachers 
on March 22, 1965. 

Treatment 15: Instructions received by teachers 
on April 2, 1965. 

To: 

6th grade teacher at School 

The — — Public Schools have been 

asked to collect, in a routine manner, some 
normative Information on a new standardized 
test. Your class has been randomly selected 
to take the test at a later date. Data from all 
schools will be pooled, and separate classes 
will not be identified in the process. The cen- 
tral office will not be involved in the scoring 
or recording of the tests. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 
ment Test on Monday morning, April 5, 1965? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students' recess period can be 
utilized for this break, but would you please 
arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, 
student will be prepared to administer the test 
to your pupils. He will bring the tests with 
him. Would you please remain in the classroom 
during the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
after first checking in with your building prin- 
cipal. 

Would you please score the tests for your 
pupils using the enclosed scoring key? The 
key will allow you to work rapidly and accur- 
ately. On the front of each pupil' s test book- 
let, mark his three grade scores obtained from 
the bottom of test pages 3, 5, and 8. Do not 
fill in the percentile ranMtf • Please assure your- 
self that the marks are accurate. 

This should take you about two hours, and 
you will be paid $7.50 for your time spent 
scoring the tests. Would you please have the 
tests scored by April 8 ? Replace the tests in 
the envelope, seal it with scotch tape, and 
leave it in the principal's office by 4 P. M. on 
April 8 so that the tests may be collected. 
Thank you. 
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TREATMENTS 12 AND 11: 

Treatment 12: Instructions received by the 
teachers on March 22, 1965. 

Treatment 16: Instructions received by the 
teachers on April 2, 1965. 

To:_ 

6th grade teacher at S chool 

The Public Schools have been 

asked to collect, in a routine manner, some 
normative information on a new standardised 
test. Your class has been randomly selected 
to take the test at a later date. Data from all 
schools will be pooled, and separate classes 
will not be identified in the process. The cen- 
tral office will not be involved in the scoring 
or recording of the tests. 

Will you please schedule time for your pu- 
pils to take the new Stanford Arithmetic Achieve- 



ment Test on Monday morning, April 5, 1965 ? 
The test should be given in two sittings, with 
at least a 15-minute break between sittings. 
Possibly the students 1 recess period can be 
utilised for this brisak, but would you please 
arrange to have the test completed during the 
morning ? 

A graduate, or advanced undergraduate, 
student will be prepared to administer the test 
to your pupils. He will bring the tests with 
him. Would you please remain in the classroom 
during the testing ? This student will arrive at 
your room about 9 A. M. on Monday, April 5, 
after first checking in with your building prin- 
cipal. 

Once your pupils have completed the test, 
replace the tests in the envelope, seal it with 
scotch tape, and leave it in the principal's of- 
fice by noon on April 5 so that the tests may be 
collected. You have no responsibility to score 
the tests. Thank you. 
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